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Full-length human cDNAs encoding potentially secreted proteins 

Related application 

The present application is a divisional of U.S. Application Serial No. 09/731,872, filed December 
7, 2000, which claims benefit, under 35 USC § 119(e), to the US Provisional Patent Applications Serial 
5 Nos 60/169,629 and 60/187,470 filed December, 8, 1999, and March, 6, 2000, respectively, the 
disclosures of which are incorporated herein by reference in their entireties. 

Field of the invention 

The present invention is directed to polynucleotides encoding GENSET polypeptides, fragments 
thereof, and the regulatory regions located in the 5'- and 3 '-ends of the GENSET genes. The invention 

10 also concerns polypeptides encoded by the GENSET polynucleotides and fragments thereof. The present 
invention also relates to recombinant vectors, which include the polynucleotides of the present invention, 
particularly recombinant vectors comprising a GENSET regulatory region or a sequence encoding a 
GENSET polypeptide, and to host cells containing the polynucleotides of the invention, as well as to 
methods of making such vectors and host cells. The present invention further relates to the use of these 

15 recombinant vectors and host cells in the production of the polypeptides of the invention. The invention 
further relates to antibodies that specifically bind to the polypeptides of the invention and to methods for 
producing such antibodies and fragments thereof. The invention also provides for methods of detecting 
the presence of the polynucleotides and polypeptides of the present invention in a sample, methods of 
diagnosis and screening of abnormal GENSET gene expression and/or biological activity, methods of 

20 screening compounds for their ability to modulate the activity or expression of GENSET genes and uses 
of such compounds. 

Background of the invention 

The estimated 50,000-100,000 genes scattered along the human chromosomes offer tremendous 
promise for the understanding, diagnosis, and treatment of human diseases. In addition, probes capable of 
25 specifically hybridizing to loci distributed throughout the human genome find applications in the 
construction of high resolution chromosome maps and in the identification of individuals. 

Currently, two different approaches are being pursued for identifying and characterizing the genes 
distributed along the human genome. In one approach, large fragments of genomic DNA are isolated, 
cloned, and sequenced. Potential open reading frames in these genomic sequences are identified using 
30 bio-informatics software. However, this approach entails sequencing large stretches of human DNA 
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which do not encode proteins in order to find the protein encoding sequences scattered throughout the 
genome. In addition to requiring extensive sequencing, the bio-informatics software may mischaracterize 
the genomic sequences obtained, i.e., labeling non-coding DNA as coding DNA and vice versa. 

An alternative approach takes a more direct route to identifying and characterizing human genes. 
5 In this approach, complementary DNAs (cDNAs) are synthesized from isolated messenger RNAs 
(mRNAs) which encode human proteins. Using this approach, sequencing is only performed on DNA 
which is derived from protein coding fragments of the genome. In the past, these cDNAs, ofter short EST 
sequences were obtained from oligo-dT primed cDNA libraries. Accordingly, they mainly corresponded 
to the y untranslated region of the mRNA. In part, the prevalence of EST sequences derived from the 3' 

10 end of the mRNA is a result of the fact that typical techniques for obtaining cDNAs, are not well suited 
for isolating cDNA sequences derived from the 5' ends of mRNAs (Adams et ah, Nature 377:3-174, 
1996, Hillier et al, Genome Res. 6:807-828, 1996). In addition, in those reported instances where longer 
cDNA sequences have been obtained, the reported sequences typically correspond to coding sequences 
and do not include the full 5' untranslated region (5'UTR) of the mRNA from which the cDNA is derived. 

15 Indeed, 5'UTRs have been shown to affect either the stability or translation of mRNAs. Thus, regulation 
of gene expression may be achieved through the use of alternative 5'UTRs as shown, for instance, for the 
translation of the tissue inhibitor of metalloprotease mRNA in mitogenically activated cells (Waterhouse 
et al, J Biol Chem. 265:5585-9. 1990). Furthermore, modification of 5'UTR through mutation, insertion 
or translocation events may even be implied in pathogenesis. For instance, the fragile X syndrome, the 

20 most common cause of inherited mental retardation, is partly due to an insertion of multiple CGG 
trinucleotides in the 5'UTR of the fragile X mRNA resulting in the inhibition of protein synthesis via 
ribosome stalling (Feng et al, Science 268:731-4, 1995). An aberrant mutation in regions of the 5'UTR 
known to inhibit translation of the proto-oncogene c-myc was shown to result in upregulation of c-myc 
protein levels in cells derived from patients with multiple myelomas (Willis et al, Curr Top Microbiol 

25 Immunol 224:269-76, 1997). In addition, the use of oligo-dT primed cDNA libraries does not allow the 
isolation of complete 5'UTRs since such incomplete sequences obtained by this process may not include 
the first exon of the mRNA, particularly in situations where the first exon is short. Furthermore, they may 
not include some exons, often short ones, which are located upstream of splicing sites. Thus, there is a 
need to obtain sequences derived from the 5' ends of mRNAs. 

30 Moreover, despite the great amount of EST data that large-scale sequencing projects have yielded 

(Adams et al, Nature Zll'MA, 1996, Hillier et al, Genome Res. 6:807-828, 1996), information 
concerning the biological function of the mRNAs corresponding to such obtained cDNAs has revealed to 
be limited. Indeed, whereas the knowledge of the complete coding sequence is absolutely necessary to 
investigate the biological function of mRNAs, ESTs yield only partial coding sequences. So far, large- 
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scale full-length cDNA cloning has been achieved only with limited success because of the poor 
efficiency of methods for constructing full-length cDNA libraries. Indeed, such methods require either a 
large amount of mRNA (Ederly et aL, 1995), thus resulting in non representative full-length libraries 
when small amounts of tissue are available or require PCR amplification (Maruyama et ah, 1994; 
5 CLONTECHniques, 1996) to obtain a reasonable number of clones, thus yielding strongly biased cDNA 
libraries where rare and long cDNAs are lost. Thus, there is a need to obtain full-length cDNAs, i.e. 
cDNAs containing the full coding sequence of their corresponding mRNAs. The present application 
presents a number of cDNAs, called GENSET polynucleotides, isolated from full-length cDNA librairies 
obtained from the methods described in PCT publication WO 00/37491 . 

10 While many sequences derived from human chromosomes have practical applications, 

approaches based on the identification and characterization of those chromosomal sequences which 
encode a protein product are particularly relevant to diagnostic and therapeutic uses. Of the 50,000- 
100,000 protein coding genes, those genes encoding proteins which are secreted from the cell in which 
they are synthesized, as well as the secreted proteins themselves, are particularly valuable as potential 

15 therapeutic agents. Such proteins are often involved in cell to cell communication and may be 
responsible for producing a clinically relevant response in their target cells. In fact, several secretory 
proteins, including tissue plasminogen activator, G-CSF, GM-CSF, erythropoietin, human growth 
hormone, insulin, interferon-a, interferon-p, interferon-y, and interleukin-2, are currently in clinical use. 
These proteins are used to treat a wide range of conditions, including acute myocardial infarction, acute 

20 ischemic stroke, anemia, diabetes, growth hormone deficiency, hepatitis, kidney carcinoma, 
chemotherapy induced neutropenia and multiple sclerosis. For these reasons, cDNAs encoding secreted 
proteins or fragments thereof represent a particularly valuable source of therapeutic agents. Thus, there is 
a need for the identification and characterization of secreted proteins and the nucleic acids encoding them. 
In addition to being therapeutically useful themselves, secretory proteins include short peptides, 

25 called signal peptides, at their amino termini which direct their secretion. These signal peptides are 
encoded by the signal sequences located at the 5' ends of the coding sequences of genes encoding secreted 
proteins. Because these signal peptides will direct the extracellular secretion of any protein to which they 
are operably linked, the signal sequences may be exploited to direct the efficient secretion of any protein 
by operably linking the signal sequences to a gene encoding the protein for which secretion is desired. In 

30 addition, fragments of the signal peptides called membrane-translocating sequences, may also be used to 
direct the intracellular import of a peptide or protein of interest. This may prove beneficial in gene therapy 
strategies in which it is desired to deliver a particular gene product to cells other than the cells in which it 
is produced. Signal sequences encoding signal peptides also find application in simplifying protein 
purification techniques. In such applications, the extracellular secretion of the desired protein greatly 
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facilitates purification by reducing the number of undesired proteins from which the desired protein must 
be selected. Thus, there exists a need to identify and characterize the 5' fragments of the genes for 
secretory proteins which encode signal peptides. 

Sequences coding for human proteins may also find application as therapeutics or diagnostics. In 
5 .particular, such sequences may be used to determine whether an individual is likely to express a 
detectable phenotype, such as a disease, as a consequence of a mutation in the coding sequence for a 
protein. In instances where the individual is at risk of suffering from a disease or other undesirable 
phenotype as a result of a mutation in such a coding sequence, the undesirable phenotype may be 
corrected by introducing a normal coding sequence using gene therapy. Alternatively, if the undesirable 

10 phenotype results from overexpression of the protein encoded by the coding sequence, expression of the 
protein may be reduced using antisense or triple helix based strategies. 

The GENSET human polypeptides encoded by the coding sequences may also be used as 
therapeutics by administering them directly to an individual having a condition, such as a disease, 
resulting from a mutation in the sequence encoding the polypeptide. In such an instance, the condition 

15 can be cured or ameliorated by administering the polypeptide to the individual. 

In addition, the human polypeptides or fragments thereof may be used to generate antibodies 
useful in determining the tissue type or species of origin of a biological sample. The antibodies may also 
be used to determine the subcellular localization of the human polypeptides or the cellular localization of 
polypeptides which have been fused to the human polypeptides. In addition, the antibodies may also be 

20 used in immunoaffinity chromatography techniques to isolate, purify, or enrich the human polypeptide or 
a target polypeptide which has been fused to the human polypeptide. 

Public information on the number of human genes for which the promoters and upstream 
regulatory regions have been identified and characterized is quite limited. In part, this may be due to the 
difficulty of isolating such regulatory sequences. Upstream regulatory sequences such as transcription 

25 factor binding sites are typically too short to be utilized as probes for isolating promoters from human 
genomic libraries. Recently, some approaches have been developed to isolate human promoters. One of 
them consists of making a CpG island library (Cross et ai, Nature Genetics 6: 236-244, 1994). The 
second consists of isolating human genomic DNA sequences containing Spel binding sites by the use of 
Spel binding protein. (Mortlock et ai, Genome Res. 6:327-335, 1996). Both of these approaches have 

30 their limits due to a lack of specificity and of comprehensiveness. Thus, there exists a need to identify 
and systematically characterize the 5' fragments of the genes. 

cDNAs including the 5' ends of their corresponding mRNA may be used to efficiently identify 
and isolate 5'UTRs and upstream regulatory regions which control the location, developmental stage, rate, 
and quantity of protein synthesis, as well as the stability of the mRNA (Theil et ai, BioFactors 4:87-93, 
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(1993). Once identified and characterized, these regulatory regions may be utilized in gene therapy or 
protein purification schemes to obtain the desired amount and locations of protein synthesis or to inhibit, 
reduce, or prevent the synthesis of undesirable gene products. 

In addition, cDNAs containing the 5' ends of protein genes may include sequences useful as 
5 probes for chromosome mapping and the identification of individuals. Thus, there is a need to identify 
and characterize the sequences upstream of the 5' coding sequences of genes encoding proteins. 

Summary of the invention 

The present invention provides compositions containing a purified or isolated polynucleotide 
comprising, consisting of, or consisting essentially of a nucleotide sequence selected from the group 

10 consisting of: (a) the sequences of SEQ ID Nos: 1-241; (b) the sequences of clone inserts of the deposited 
clone pool; (c) the full coding sequences of SEQ ID Nos: 1-241; (d) the full coding sequences of the clone 
inserts of the deposited clone pool; (e) the sequences encoding one of the polypeptides of SEQ ID Nos: 
242-482; (f) the sequences encoding one of the polypeptides encoded by the clone inserts of the deposited 
clone pool; (g) the genomic sequences coding for GENSET polypeptides; (h) the 5' transcriptional 

15 regulatory regions of GENSET genes; (i) the 3* transcriptional regulatory regions of GENSET genes; (j) 
the polynucleotides comprising the nucleotide sequence of any combination of (g)-(i); (k) the variant 
polynucleotides of any of the polynucleotides of (a)-(j); (1) the polynucleotides comprising a nucleotide 
sequence of (a)-(k), wherein the polynucleotide is single stranded, double stranded, or a portion is single 
stranded and a portion is double stranded; (m) the polynucleotides comprising a nucleotide sequence 

20 complementary to any of the single stranded polynucleotides of (1). The invention further provides for 
fragments of the nucleic acid molecules of (a)-(m) described above. 

The present invention also provides biologically active forms, variants, fragments and derivatives 
of the present proteins, where "biologically active" indicates that the form, variant, fragment, or 
derivative, has any detectable activity in any in vitro assay known in the art or described herein, or has 

25 any detectable function in vivo. In preferred embodiments, a determination of whether a particular 
polypeptide is biologically active will be made based on any of the specific assays or functional 
characteristics provided below for each of the proteins of this invention. 

Therefore, one embodiment of the present invention is a composition containing a purified or 
isolated nucleic acid comprising a sequence selected from the group consisting of sequences of SEQ ID 

30 NOs: 1-241 and sequences of clone inserts of the deposited clone pool, sequences complementary thereto, 
allelic variants thereof, and degenerate variants thereof. In one aspect of this embodiment, the nucleic 
acid is recombinant. 
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Another embodiment of the present invention is a composition containing a purified or isolated 
nucleic acid comprising at least 8 consecutive nucleotides of a sequence selected from the group 
consisting of sequences of SEQ ID NOs: 1-241 and sequences of clone inserts of the deposited clone 
pool, sequences complementary thereto, allelic variants thereof, and degenerate variants thereof. In one 
5 aspect of this embodiment, the nucleic acid comprises at least 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 
75, 100, 150, 200, 300, 400, 500, 800, 1000, 1500, or 2000 consecutive nucleotides of said selected 
sequence, sequences complentary thereto, allelic variants thereof, and degenerate variants thereof. The 
nucleic acid may be a recombinant nucleic acid. 

Another embodiment of the present invention is a composition comprising a vertebrate purified or 

10 isolated nucleic acid of at least 15,18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, 1000 or 2000 
nucleotides in length which hybridizes under stringent conditions to any polynucleotide of the invention, 
preferably a sequence selected from the group consisting of sequences of SEQ ID NOs: 1-241 and 
sequences of clone inserts of the deposited clone pool, sequences complementary thereto. In one aspect 
of this embodiment, the nucleic acid is recombinant. 

15 Another embodiment of the present invention is a composition containing a purified or isolated 

nucleic acid comprising the full coding sequences of a sequence selected from the group consisting of 
sequences of SEQ ID NOs: 1-241 and sequences of clone inserts of the deposited clone pool, or an allelic 
variant thereof. In one aspect of this embodiment, the nucleic acid is recombinant. 

A further embodiment of the present invention is a composition containing a purified or isolated 

20 nucleic acid comprising a contiguous span of a sequence selected from the group consisting of sequences 
of SEQ ID NOs: 1-31 and 33-143 and sequences of clone inserts encoding secreted proteins in the 
deposited clone pool, or an allelic variant thereof, wherein said contiguous span encodes a mature protein. 
In one aspect of this embodiment, the nucleic acid is recombinant. In another aspect of this embodiment, 
the nucleic acid is an expression vector wherein said contiguous span which encodes a mature protein is 

25 operably linked to a promoter. 

Yet another embodiment of the present invention is a composition containing a purified or 
isolated nucleic acid comprising a contiguous span of a sequence selected from the group consisting of 
sequences of SEQ ID NOs: 1-31 and 33-143 and sequences of clone inserts encoding secreted proteins in 
the deposited clone pool, or an allelic variant thereof, wherein said contiguous span encodes a signal 

30 peptide. In one aspect of this embodiment, the nucleic acid is recombinant. In another aspect of this 
embodiment, the nucleic acid is an fusion vector wherein said contiguous span which encodes a signal 
peptide is operably linked to a second nucleic acid encoding an heterologous polypeptide. 

Another embodiment of the present invention is a composition containing a purified or isolated 
nucleic acid encoding a polypeptide comprising a sequence selected from the group consisting of 
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sequences of SEQ ID NOs: 1-241 and sequences of clone inserts of the deposited clone pool, or allelic 

variant thereof. In one aspect of this embodiment, the nucleic acid is recombinant. 

Another embodiment of the present invention is a composition containing a purified or isolated 

nucleic acid encoding a polypeptide comprising the sequence of a mature protein included in a sequence 
5 selected from the group consisting of sequences of SEQ ID NOs: 1-31 and 33-143 and sequences of clone 

inserts encoding secreted proteins in the deposited clone pool, or allelic variant thereof. In one aspect of 

this embodiment, the nucleic acid is recombinant. 

Another embodiment of the present invention is a composition containing a purified or isolated 

nucleic acid encoding a polypeptide comprising the sequence of a signal peptide included in a sequence 
10 selected from the group consisting of sequences of SEQ ID NOs: 1-31 and 33-143 and sequences of clone 

inserts encoding secreted proteins in the deposited clone pool, or allelic variant thereof. In another aspect 

it is present in a vector of the invention. 

Further embodiments of the invention include compositions containing purified or isolated 

polynucleotides that comprise, a nucleotide sequence at least 70% identical, more preferably at least 75% 
15 identical, and still more preferably at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to 

any of the polynucleotides of the present invention. Methods of determining identity include those well 

known in the art and described herein. Such analyses can be performed using a full length polynucleotide 

sequence or using a subsequence of any length. For example, any two sequences can be compared over a 

region, in either protein or in both proteins, of any 10, 25, 50, 100, 250, 500, 1000, 2000 or more 
20 contiguous nucleotides. In addition, any two sequences can be identified as homologous even when they 

share sequence homology over a limited region of either polynucleotide, for example over a region of at 

least about 10, 25, 50, 100, 250, 500, 1000, or more contiguous nucleotides. 

The invention further provides compositions containing a purified or isolated polypeptide 

comprising, consisting of, or consisting essentially of an amino acid sequence selected from the group 
25 consisting of: (a) the polypeptides of SEQ ID Nos: 242-482; (b) the polypeptides encoded by the clone 

inserts of the deposited clone pool; (c) the epitope-bearing fragments of the polypeptides of SEQ ED Nos: 

242-482; (d) the epitope-bearing fragments of the polypeptides encoded by the clone inserts contained in 

the deposited clone pool; (e) the domains of the polypeptides of SEQ ID Nos: 242-482; (f) the domains of 

the polypeptides encoded by the clone inserts contained in the deposited clone pool; and (g) the allelic 
30 variant polypeptides of any of the polypeptides of (a)-(f). The invention further provides for fragments of 

the polypeptides of (a)-(g) above, such as those having biological activity or comprising biologically 

functional domain(s). 

Yet another embodiment of the present invention is a composition containing a purified or 
isolated protein comprising a sequence selected from the group consisting of sequences of SEQ ED NOs: 
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242-482 and sequences of polypeptides encoded by clone inserts of the deposited clone pool, or allelic 
variant thereof. 

Another embodiment of the present invention is a composition containing a purified or isolated 
polypeptide comprising at least 5, 6 or 8 consecutive amino acids of a sequence selected from the group 
5 consisting of sequences of SEQ ID NOs: 242-482 and sequences of polypeptides encoded by clone inserts 
of the deposited clone pool, or allellic variant thereof. In one aspect of this embodiment, the purified or 
isolated polypeptide comprises at least 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, 200, 250, 300, 
350, 400, 450 or 500 consecutive amino acids of said selected sequence or allelic variant thereof. 

Another embodiment of the present invention is a composition containing an isolated or purified 

10 polypeptide comprising a signal peptide of a sequence selected from the group consisting of sequences of 
SEQ ID NOs: 242-272 and 274-384 and sequences of polypeptides encoded by clone inserts of the 
deposited clone pool, or allellic variant thereof. 

Yet another embodiment of the present invention is a composition containing an isolated or 
purified polypeptide comprising a mature protein of a sequence selected from the group consisting of 

15 sequences of SEQ ID NOs: 242-272 and 274-384 and sequences of polypeptides encoded by clone inserts 
of the deposited clone pool, or allellic variant thereof. 

A further embodiment of the present invention are compositions containing polypeptide having 
an amino acid sequence with at least 70% similarity, and more preferably at least 75%, 80%, 85%, 90%, 
95%, 96%, 97%, 98%, or 99% similarity to a polypeptide of the present invention, as well as polypeptides 

20 having an amino acid sequence at least 70% identical, more preferably at least 75% identical, and still 
more preferably 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to a polypeptide of the present 
invention. Such analyses can be performed using a full length polypeptide sequence or using a 
subsequence of any length. For example, any two sequences can be compared over a region, in either 
protein or in both proteins, of any 10, 25, 50, 100, 250, 500, 1000, 2000 or more contiguous amino acids. 

25 In addition, any two sequences can be identified as homologous even when they share sequence 
homology over a limited region of either protein, for example over a region of at least about 10, 25, 50, 
100, 250, 500, 1000, or more contiguous amino acids. Further included in the invention are compositions 
comprising a purified or isolated nucleic acid molecule encoding such polypeptides. Methods for 
determining identity include those well known in the art and described herein. 

30 The present invention also relates to compositions comprising recombinant vectors, which include 

the purified or isolated polynucleotides of the present invention, and to host cells recombinant for the 
polynucleotides of the present invention, as well as to methods of making such vectors and host cells. The 
present invention further relates to the use of these recombinant vectors and recombinant host cells in the 
production of GENSET polypeptides. 
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Consequently, another embodiment of the invention is a vector comprising any polynucleotide of 
the invention. In a preferred embodiment, the vector is an expression vector comprising a nucleic acid 
sequence encoding a polypeptide selected from the group consisting of sequences of SEQ ID NOs: 242- 
482 and sequences of polypeptides encoded by the clone inserts of the deposited clone pool, or allelic 
5 variant thereof, wherein said nucleic acid sequence is operably linked to a promoter. In another preferred 
embodiment, the vector is a secretion vector comprising a nucleic acid sequence encoding a signal peptide 
selected from the group consisting of signal peptides of sequences of SEQ ID NOs: 242-272 and 274-384 
and sequences of secreted polypeptides encoded by the clone inserts of the deposited clone pool, or allelic 
variant thereof, wherein said nucleic acid sequence is operably linked to an heterologous protein such that 
10 said signal peptide will direct the secretion of said heterolgous protein. 

A further embodiment of the present invention is a method of making a protein comprising a 
sequence selected from the group consisting of sequences of SEQ ID NOs: 242-482 and sequences of 
polypeptides encoded by clone inserts of the deposited clone pool, comprising the steps of 

a) obtaining a cDNA comprising a sequence selected from the group consisting of sequences of 
15 SEQ ID NOs: 1-241 and sequences of clone inserts of the deposited clone pool; 

b) inserting said cDNA in an expression vector such that said cDNA is operably linked to a 
promoter; and 

c) introducing said expression vector into a host cell whereby the host cell produces the protein 
encoded by said cDNA. 

20 In one aspect of this embodiment, the method further comprises the step of isolating said protein. 

Another embodiment of the present invention is a protein obtainable by the method described in 
the preceding paragraph. 

Another embodiment of the present invention is a method of making a protein comprising the 
amino acid sequence of the mature protein contained in a sequence selected from the group consisting of 
25 sequences of SEQ ID NOs: 242-272 and 274-384 and sequences of polypeptides encoded by clone inserts 
of the deposited clone pool, comprising the steps of 

a) obtaining a cDNA comprising a sequence selected from the group consisting of sequences of 
SEQ ID NOs: 1-31 and 33-143 and sequences of clone inserts of the deposited clone pool, wherein said 
cDNA encodes a mature protein; 
30 b) inserting said cDNA in an expression vector such that said cDNA is operably linked to a 

promoter; and 

c) introducing said expression vector into a host cell whereby the host cell produces the mature 
protein encoded by said cDNA. 

In one aspect of this embodiment, the method further comprises the step of isolating said protein. 
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Another embodiment of the present invention is a mature protein obtainable by the method 
described in the preceding paragraph. 

Another embodiment of the present invention is a composition containing a host cell containing 
the purified or isolated nucleic acids comprising a sequence selected from the group consisting of 
5 sequences of SEQ ID NOs: 1-241 and sequences of clone inserts of the deposited clone pool or a 
sequence complementary thereto described herein. 

Another embodiment of the present invention is a composition containing a host cell containing 
the purified or isolated nucleic acids comprising the full coding sequences of a sequence selected from the 
group consisting of sequences of SEQ ID NOs: 1-241 and sequences of clone inserts of the deposited 
10 clone pool. 

Another embodiment of the present invention is a composition containing a host cell containing 
the purified or isolated nucleic acids comprising a contiguous span of a sequence selected from the group 
consisting of sequences of SEQ ID NOs: 1-31 and 33-143 and sequences of clone inserts of the deposited 
clone pool, wherein said contiguous span codes for a mature protein. 

15 Another embodiment of the present invention is a composition containing a host cell containing 

the purified or isolated nucleic acids comprising a contiguous span of a sequence selected from the group 
consisting of sequences of SEQ ID NOs: 1-31 and 33-143 and sequences of clone inserts of the deposited 
clone pool, wherein said contiguous span codes for a signal peptide. 

The invention further relates to other methods of making the polypeptides of the present 

20 invention. 

The present invention further relates to transgenic plants or animals, wherein said transgenic plant 
or animal is transgenic for a polynucleotide of the present invention and expresses a polypeptide of the 
present invention. 

The invention further relates to compositions comprising antibodies that specifically bind to the 
25 GENSET polypeptides of the present invention and fragments thereof as well as to methods for producing 
such antibodies and fragments thereof. 

Therefore, another embodiment of the present invention is a composition containing a purified or 
isolated antibody capable of specifically binding to a protein comprising a sequence selected from the 
group consisting of sequences of SEQ ID NOs: 242-482 and sequences of polypeptides encoded by clone 
30 inserts of the deposited clone pool. In one aspect of this embodiment, the antibody is capable of binding 
to a polypeptide comprising at least 6 consecutive amino acids, at least 8 consecutive amino acids, or at 
least 10 consecutive amino acids of said selected sequence. 

The invention also provides kits and methods of detecting GENSET gene expression and/or 
biological activity in a biological sample. One such method involves assaying for the expression of a 
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GENSET polynucleotide in a biological sample using polymerase chain reaction (PCR) to amplify and 
detect GENSET polynucleotides or Southern and Northern blot hybridization to detect GENSET genomic 
DNA, cDNA or mRNA. Alternatively, a method of detecting GENSET gene expression in a test sample 
can be accomplished using a compound which binds to a GENSET polypeptide of the present invention 
5 or a portion of a GENSET polypeptide. 

The present invention also relates to diagnostic methods of identifying individuals or non-human 
animals having elevated or reduced levels of GENSET products, which individuals are likely to benefit 
from therapies to suppress or enhance GENSET gene expression, respectively and to methods of 
identifying individuals or non-human animals at increased risk for developing, or present state of having, 

10 certain diseases/disorders associated with GENSET gene abnormal expression or biological activity. 

The present invention also relates to kits and methods of screening compounds for their ability to 
modulate (e.g. increase or inhibit) the activity or expression of GENSET genes including compounds that 
interact with GENSET gene regulatory sequences and compounds that interact directly or indirectly with 
GENSET polypeptides. Uses of such compounds are also under the scope of the present invention. 

15 The present invention also relates to pharmaceutical or physiologically acceptable compositions 

comprising, an active agent, the polypeptides, polynucleotides or antibodies of the present invention. 

The present invention also relates to computer systems containing cDNA codes and polypeptides 
codes of sequences of the invention and to computer-related methods of comparing sequences, identifying 
homology or features using GENSET sequences of the invention. 

20 In another aspect, the present invention provides an isolated polynucleotide, said polynucleotide 

comprising a nucleic acid sequence encoding i) a polypeptide comprising an amino acid sequence having 
at least about 80% identity to any one of the sequences shown as SEQ ID NOs:242-482 or any one of the 
sequences of polypeptides encoded by the clone inserts of the deposited clone pool; or a biologically 
active fragment of said polypeptide. 

25 In one embodiment, the polypeptide comprises any one of the sequences shown as SEQ ID 

NOs:242-482 or any one of the sequences of the polypeptides encoded by the clone inserts of the 
deposited clone pool. In another embodiment, the polypeptide comprises a signal peptide. In another 
embodiment, the polypeptide is a mature protein. In another embodiment, the nucleic acid sequence has 
at least about 80% identity over at least about 100 contiguous nucleotides to any one of the sequences 

30 shown as SEQ ID NOs: 1-241 or any one of the sequences of the clone inserts of the deposited clone pool. 
In another embodiment, the polynucleotide hybridizes under stringent conditions to a polynucleotide 
comprising any one of the sequences shown as SEQ ID NOs: 1-241 or any one of the sequences of the 
clone inserts of the deposited clone pool. In another embodiment, the nucleic acid sequence comprises 
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any one of the sequences shown as SEQ ID NOs: 1-241 or any one the sequences of the clone inserts of 
the deposited clone pool. In another embodiment, the polynucleotide is operably linked to a promoter. 

In another aspect, the present invention provides an expression vector comprising the 
polynucleotide operably linked to a promoter. In another aspect, the present invention provides a host cell 
5 recombinant for the polynucleotide. In another aspect, the present invention provides a non-human 
transgenic animal comprising the host cell. 

In another aspect, the present invention provides a method of making a GENSET polypeptide, the 
method comprising a) providing a population of host cells comprising a herein-described polynucleotide 
and b) culturing the population of host cells under conditions conducive to the production of the 
10 polypeptide within said host cells. 

In one embodiment, the method further comprises purifying the polypeptide from the population 
of host cells. 

In another aspect, the present invention provides a method of making a GENSET polypeptide, the 
method comprising a) providing a population of cells comprising a herein-described polynucleotide; b) 
15 culturing the population of cells under conditions conducive to the production of the polypeptide within 
the cells; and c) purifying the polypeptide from the population of cells. 

In another aspect, the present invention provides an isolated polynucleotide, the polynucleotide 
comprising a nucleic acid sequence having at least about 80% identity over at least about 100 contiguous 
nucleotides to any one of the sequences shown as SEQ ID NOs: 1-241 or any one of the sequences of the 
20 clone inserts of the deposited clone pool. 

In one embodiment, the polynucleotide hybridizes under stringent conditions to a polynucleotide 
comprising any one of the sequences shown as SEQ ID NOs: 1-241 or any one of the sequences of the 
clone inserts of the deposited clone pool. In another embodiment, the polynucleotide comprises any one 
of the sequences shown as SEQ ID NOs: 1-241 or any one of the sequences of the clone inserts of the 
25 deposited clone pool. 

In another aspect, the present invention provides a biologically active polypeptide encoded by any 
of the herein-described polynucleotides. 

In another aspect, the present invention provides an isolated polypeptide or biologically active 
fragment thereof, the polypeptide comprising an amino acid sequence having at least about 80% sequence 
30 identity to any one of the sequences shown as SEQ ID NOs:242-482 or any one of the sequences of 
polypeptides encoded by the clone inserts of the deposited clone pool. 

In one embodiment, the polypeptide is selectively recognized by an antibody raised against an 
antigenic polypeptide, or an antigenic fragment thereof, said antigenic polypeptide comprising any one of 
the sequences shown as SEQ ID NOs:242-482 or any one of the sequences of polypeptides encoded by 
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the clone inserts of the deposited clone pool In another embodiment, the polypeptide comprises any one 
of the sequences shown as SEQ ID NOs:242-482 or any one of the sequences of polypeptides encoded by 
the clone inserts of the deposited clone pool. In another embodiment, the polypeptide comprises a signal 
peptide. In another embodiment, the polypeptide is a mature protein. 
5 In another aspect, the present invention provides an antibody that specifically binds to any of ther 

herein-described polypeptides. 

In another aspect, the present invention provides a method of determining whether a GENSET 
gene is expressed within a mammal, the method comprising the steps of: a) providing a biological sample 
from said mammal; b) contacting said biological sample with either of: i) a polynucleotide that 

10 hybridizes under stringent conditions to the polynucleotide of claim 1; or ii) a polypeptide that 
specifically binds to the polypeptide of claim 19; and c) detecting the presence or absence of 
hybridization between the polynucleotide and an RNA species within the sample, or the presence or 
absence of binding of the polypeptide to a protein within the sample; wherein a detection of the 
hybridization or of the binding indicates that the GENSET gene is expressed within the mammal. 

15 In one embodiment, the polynucleotide is a primer, and the hybridization is detected by detecting 

the presence of an amplification product comprising the sequence of the primer. In another embodiment, 
the polypeptide is an antibody. 

In another aspect, the present invention provides a method of determining whether a mammal has 
an elevated or reduced level of GENSET gene expression, the method comprising the steps of : a) 

20 providing a biological sample from the mammal; and b) comparing the amount of any of the herein- 
described polypeptides, or of an RNA species encoding the polypeptide, within the biological sample 
with a level detected in or expected from a control sample; wherein an increased amount of the 
polypeptide or the RNA species within the biological sample compared to the level detected in or 
expected from the control sample indicates that the mammal has an elevated level of the GENSET gene 

25 expression, and wherein a decreased amount of the polypeptide or the RNA species within the biological 
sample compared to the level detected in or expected from the control sample indicates that the mammal 
has a reduced level of the GENSET gene expression. 

In another aspect, the present invention provides a method of identifying a candidate modulator of 
a GENSET polypeptide, the method comprising : a) contacting any of the herein-described polypeptides 

30 with a test compound; and b) determining whether the compound specifically binds to the polypeptide; 
wherein a detection that the compound specifically binds to the polypeptide indicates that the compound 
is a candidate modulator of the GENSET polypeptide. 
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Brief description of drawings 

Figure 1 is a map of the expression vector pPT 

Figure 2 is a block diagram of an exemplary computer system. 

Figure 3 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 
5 nucleotide or protein sequence with a database of sequences in order to determine the identity levels 
between the new sequence and the sequences in the database. 

Figure 4 is a flow diagram illustrating one embodiment of a process 250 in a computer for 
determining whether two sequences are homologous. 

Figure 5 is a flow diagram illustrating one embodiment of an identifier process 300 for detecting 
10 the presence of a feature in a sequence. 

Brief Description of Tables 

Table I provides the applicant's internal designation number assigned to each sequence 
identification number and indicates whether the sequence is a nucleic acid sequence or a polypeptide 
sequence, and in which vector the cDNA was cloned. 
15 Table II provides structural features for each cDNA of SEQ ID Nos: 1-241 i.e., the locations of 

the full coding sequences, the signal peptides, the mature polypeptides, the polyA signal and the polyA 
site. 

Table III lists variants for cDNAs of the present invention. 

Table IV provides the positions of fragments which are preferably excluded from the present 
20 invention. 

Tables Va and b provides the positions of fragments which are preferably excluded or included in 
the present invention. Table IV and Tables Va, and Table Vb provide for the inclusion and exclusion of 
polynucleotides independently from each other in addition to those described elsewhere in the 
specification and is therefore, not meant as limiting description. 
25 Table VI lists known biologically structural and functional domains for the polypeptides of the 

present invention. 

Table VII lists antigenic peaks of predicted antigenic epitopes for polypeptides of the present 
invention. 

Table VIII lists the putative chromosomal location of the polynucleotides of the present 
30 invention. 

Table DC list the Genset's cDNA libraries of tissues and cell types examined that express the 
polynucleotides of the present invention. 
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Table X relates to the bias in spatial distribution of the polynucleotide sequences of the present 
invention. 

Table XI lists predicted subcellular localization for cDNAs of the present invention. 

Table XII gives the correspondence between the polynucleotides of the US priority applications, 
5 namely the US Provisional Patent Applications Serial Nos 60/169,629 and 60/187, (column entitled "Seq 
Id No in priority applications") and the polynucleotides of the present application (column entitled "Seq 
Id No in present application"). 

Brief description of sequence listing 

SEQ ID Nos: 1-31 and 33-143 are the nucleotide sequences of cDNAs encoding a potentially 

10 secreted protein. The locations of the ORFs and sequences encoding signal peptides are listed in the 
accompanying Sequence Listing. In addition, the von Heijne score of the signal peptide computed as 
described below is listed as the "score" in the accompanying Sequence Listing. The sequence of the 
signal-peptide is listed as "seq" in the accompanying Sequence Listing. The "/" in the signal peptide 
sequence indicates the location where proteolytic cleavage of the signal peptide occurs to generate a 

15 mature protein. When appropriate, the locations of the first and last nucleotides of the coding sequences, 
eventually the locations of the first and last nucleotides of the polyA and the locations of the first and last 
nucleotides of the polyA sites are indicated. 

SEQ ID Nos. 32 and 144-241 are the nucleotide sequences of cDNAs in which no sequence 
encoding a signal peptide has been identified to date. However, it remains possible that subsequent 

20 analysis will identify a sequence encoding a signal peptide in these nucleic acids. The locations of the 
ORFs are listed in the accompanying Sequence Listing. When appropriate, the locations of the first and 
last nucleotides of the coding sequences, eventually the locations of the first and last nucleotides of the 
polyA and the locations of the first and last nucleotides of the polyA sites are indicated. 

SEQ ID Nos: 242-272 and 274-384 are the amino acid sequences of polypeptides which contain a 

25 signal peptide. These polypeptides are encoded by the cDNAs of SEQ ID Nos: 1-31 and 33-143 
respectively. The location of the signal peptide is listed in the accompanying Sequence Listing. 

SEQ ID Nos: 273 and 385-482 are the amino acid sequences of polypeptides in which no signal 
peptide has been identified to date. However, it remains possible that subsequent analysis will identify a 
signal peptide in these polypeptides. These polypeptides are encoded by the nucleic acids of SEQ ID 

30 Nos: 32 and 144-241 respectively. 

In accordance with the regulations relating to Sequence Listings, the following codes have been 
used in the Sequence Listing to describes nucleotide sequences. The code "r" in the sequences indicates 
that the nucleotide may be a guanine or an adenine. The code "y" in the sequences indicates that the 
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nucleotide may be a thymine or a cytosine. The code "m" in the sequences indicates that the nucleotide 
may be an adenine or a cytosine. The code "k" in the sequences indicates that the nucleotide may be a 
guanine or a thymine. The code "s" in the sequences indicates that the nucleotide may be a guanine or a 
cytosine. The code "w" in the sequences indicates that the nucleotide may be an adenine or an thymine. 
5 In addition, all instances of the symbol "n" in the nucleic acid sequences mean that the nucleotide can be 
adenine, guanine, cytosine or thymine. 

In some instances, the polypeptide sequences in the Sequence Listing contain the symbol "Xaa." 
These "Xaa" symbols indicate either (1) a residue which cannot be identified because of nucleotide 
sequence ambiguity or (2) a stop codon in the determined sequence where applicants believe one should 

10 not exist (if the sequence were determined more accurately). In some instances, several possible 
identities of the unknown amino acids may be suggested by the genetic code. 

In the case of secreted proteins, it should be noted that, in accordance with the regulations 
governing Sequence Listings, in the appended Sequence Listing, the encoded protein (i.e. the protein 
containing the signal peptide and the mature protein or part thereof) extends from an amino acid residue 

1 5 having a negative number through a positively numbered amino acid residue. Thus, the first amino acid 
of the mature protein resulting from cleavage of the signal peptide is designated as amino acid number 1, 
and the first amino acid of the signal peptide is designated with the appropriate negative number. 
However, in the present application, positions on amino acid sequences are always given on the full 
length polypeptide, the first amino acid of the signal peptide being designated as amino acid number 1. 

20 Detailed description 

Definitions 

Before describing the invention in greater detail, the following definitions are set forth to 
illustrate and define the meaning and scope of the terms used to describe the invention herein. 

The terms " GENSET gene ", when used herein, encompasses genomic, mRNA and cDNA 
25 sequences encoding the GENSET protein, including the 5' and 3' untranslated regions of said sequences. 

As used herein, a " secreted " protein is one which, when expressed in a suitable host cell, is 
transported across or through a membrane, including transport as a result of signal peptides in its amino 
acid sequence. "Secreted" proteins include without limitation proteins secreted wholly (e.g. soluble 
proteins), or partially (e.g. receptors) from the cell in which they are expressed. "Secreted" proteins also 
30 include without limitation proteins which are transported across the membrane of the endoplasmic 
reticulum. As used herein, a " mature protein " is the polypeptide fragment generated after the cleavage of 
the signal peptide. 
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The term " full coding sequence " or open reading frame (ORF) of a GENSET gene, when used 
herein, refers to the complete coding sequence of said gene. In the case of a secreted protein, the full 
coding sequence comprises the coding sequence for the signal peptide and the coding sequence for the 
mature polypeptide. Accordingly, the term "full-length polypeptide" refers to the complete polypeptide 
5 encoded by said GENSET gene and in the case of a secreted protein it comprises both the signal peptide 
and the mature polypeptide. The positions of the full length polypeptides and, in the case of secreted 
proteins, of signal peptides and mature polypeptides are given in the appended sequence listing. 

The term " GENSET biological activity " is intended for polypeptides exhibiting an activity 
similar, but not necessarily identical, to an activity of the GENSET polypeptide of the invention. The 

10 GENSET biological activity of a given polypeptide may be assessed using a suitable biological assay well 
known to those skilled in the art such as the one(s) described herein. In contrast, the term "biological 
activity" refers to any activity that a polypeptide of the invention may have. 

The term " corresponding mRNA " refers to the mRNA which was the template for the cDNA 
synthesis which produced a cDNA of the present invention. 

15 The term " corresponding genomic DNA " refers to the genomic DNA which encodes mRNA 

which includes the sequence of one of the strands of the cDNA in which thymidine residues in the 
sequence of the cDNA are replaced by uracil residues in the mRNA. 

The term " deposited clone pool " is used herein to refer to the pool of clones entitled 
GENSET.071PRF deposited in ATCC with the accession number PTA-1218 on January, 21, 2000. 

20 The term " heterologous ", when used herein, is intended to designate any polynucleotide or 

polypeptide other than the GENSET polynucleotide or polypeptide respectively. 

The term " isolated " requires that the material be removed from its original environment (e. g., the 
natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide or 
polypeptide present in a living animal is not isolated, but the same polynucleotide or DNA or polypeptide, 

25 separated from some or all of the coexisting materials in the natural system, is isolated. Such 
polynucleotide could be part of a vector and/or such polynucleotide or polypeptide could be part of a 
composition, and still be isolated in that the vector or composition is not part of its natural environment. 
For example, a naturally-occurring polynucleotide present in a living animal is not isolated, but the same 
polynucleotide, separated from some or all of the coexisting materials in the natural system, is isolated. 

30 Specifically excluded from the definition of "isolated" are: naturally-occurring chromosomes (such as 
chromosome spreads), artificial chromosome libraries, genomic libraries, and cDNA libraries that exist 
either as an in vitro nucleic acid preparation or as a transfected/transformed host cell preparation, wherein 
the host cells are either an in vitro heterogeneous preparation or plated as a heterogeneous population of 
single colonies. Also specifically excluded are the above libraries wherein a specified polynucleotide 
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makes up less than 5% of the number of nucleic acid inserts in the vector molecules. Further specifically 
excluded are whole cell genomic DNA or whole cell RNA preparations (including said whole cell 
preparations which are mechanically sheared or enzymatically digested). Further specifically excluded are 
the above whole cell preparations as either an in vitro preparation or as a heterogeneous mixture separated 
5 by electrophoresis (including blot transfers of the same) wherein the polynucleotide of the invention has 
not further been separated from the heterologous polynucleotides in the electrophoresis medium (e.g., 
further separating by excising a single band from a heterogeneous band population in an agarose gel or 
nylon blot). 

The term " purified " does not require absolute purity; rather, it is intended as a relative definition. 

10 Purification of starting material or natural material to at least one order of magnitude, preferably two or 
three orders, and more preferably four or five orders of magnitude is expressly contemplated. As an 
example, purification from 0.1 % concentration to 10 % concentration is two orders of magnitude. To 
illustrate, individual cDNA clones isolated from a cDNA library have been conventionally purified to 
electrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly 

15 either from the library or from total human DNA. The cDNA clones are not naturally occurring as such, 
but rather are obtained via manipulation of a partially purified naturally occurring substance (messenger 
RNA). The conversion of mRNA into a cDNA library involves the creation of a synthetic substance 
(cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection. 
Thus, creating a cDNA library from messenger RNA and subsequently isolating individual clones from 

20 that library results in an approximately 10 4 -10 6 fold purification of the native message. 

The term "purified" is further used herein to describe a polypeptide or polynucleotide of the 
invention which has been separated from other compounds including, but not limited to, polypeptides or 
polynucleotides, carbohydrates, lipids, etc. The term "purified" may be used to specify the separation of 
monomeric polypeptides of the invention from oligomeric forms such as homo- or hetero- dimers, 

25 trimers, etc. The term "purified" may also be used to specify the separation of covalently closed 
polynucleotides from linear polynucleotides. A polynucleotide is substantially pure when at least about 
50%, preferably 60 to 75% of a sample exhibits a single polynucleotide sequence and conformation 
(linear versus covalently close). A substantially pure polypeptide or polynucleotide typically comprises 
about 50%, preferably 60 to 90% weight/weight of a polypeptide or polynucleotide sample, respectively, 

30 more usually about 95%, and preferably is over about 99% pure. Polypeptide and polynucleotide purity, 
or homogeneity, is indicated by a number of means well known in the art, such as agarose or 
polyacrylamide gel electrophoresis of a sample, followed by visualizing a single band upon staining the 
gel. For certain purposes higher resolution can be provided by using HPLC or other means well known in 
the art. As an alternative embodiment, purification of the polypeptides and polynucleotides of the present 
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invention may be expressed as "at least" a percent purity relative to heterologous polypeptides and 
polynucleotides (DNA, RNA or both). As a preferred embodiment, the polypeptides and polynucleotides of 
the present invention are at least; 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 96%, 
98%, 99%, or 100% pure relative to heterologous polypeptides and polynucleotides, respectively. As a 
5 further preferred embodiment the polypeptides and polynucleotides have a purity ranging from any 
number, to the thousandth position, between 90% and 100% (e.g., a polypeptide or polynucleotide at least 
99.995% pure) relative to either heterologous polypeptides or polynucleotides, respectively, or as a 
weight/weight ratio relative to all compounds and molecules other than those existing in the carrier. Each 
number representing a percent purity, to the thousandth position, may be claimed as individual species of 
10 purity. 

As used interchangeably herein, the terms " nucleic acid molecule(s) ", " oligonucleotide(s) ", and 
" polynucleotide(s) " include RNA or DNA (either single or double stranded, coding, complementary or 
antisense), or RNA/DNA hybrid sequences of more than one nucleotide in either single chain or duplex 
form (although each of the above species may be particularly specified). The term " nucleotide " is used 

15 herein as an adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of 
any length in "single-stranded or duplex form. More precisely, the expression "nucleotide sequence" 
encompasses the nucleic material itself and is thus not restricted to the sequence information (i.e. the 
succession of letters chosen among the four base letters) that biochemically characterizes a specific DNA 
or RNA molecule. The term "nucleotide" is also used herein as a noun to refer to individual nucleotides 

20 or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic acid molecule, 
comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a phosphate group, or 
phosphodiester linkage in the case of nucleotides within an oligonucleotide or polynucleotide. The term 
"nucleotide" is also used herein to encompass "modified nucleotides" which comprise at least one 
modifications such as (a) an alternative linking group, (b) an analogous form of purine, (c) an analogous 

25 form of pyrimidine, or (d) an analogous sugar. For examples of analogous linking groups, purine, 
pyrimidines, and sugars see for example PCT publication No. WO 95/04064, which disclosure is hereby 
incorporated by reference in its entirety. Preferred modifications of the present invention include, but are 
not limited to, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 4- 
acetylcytosine, 5 -(carboxyhydroxylmethyl) uracil, 5 -carboxymethylaminomethyl-2-thiouridine, 5 - 

30 carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6- 
isopentenyladenine, 1 -methylguanine, 1 -methylinosine, 2,2-dimethyl guanine, 2-methyladenine, 2- 
methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5- 
methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5 - 
methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic 
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acid (v) ybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4- 
thiouracil, 5-methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid, 5 -methyl -2- 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, and 2,6-diaminopurine. The polynucleotide 
sequences of the invention may be prepared by any known method, including synthetic, recombinant, ex 
5 vivo generation, or a combination thereof, as well as utilizing any purification methods known in the art. 
Methylenemethylimino linked oligonucleosides as well as mixed backbone compounds having, may be 
prepared as described in U.S. Pat. Nos. 5,378,825; 5,386,023; 5,489,677; 5,602,240; and 5,610,289, 
which disclosures are hereby incorporated by reference in their entireties. Formacetal and thioformacetal 
linked oligonucleosides may be prepared as described in U.S. Pat. Nos. 5,264,562 and 5,264,564, which 

10 disclosures are hereby incorporated by reference in their entireties. Ethylene oxide linked 
oligonucleosides may be prepared as described in U.S. Pat. No. 5,223,618, which disclosure is hereby 
incorporated by reference in its entirety. Phosphinate oligonucleotides may be prepared as described in 
U.S. Pat. No. 5,508,270, which disclosure is hereby incorporated by reference in its entirety. Alkyl 
phosphonate oligonucleotides may be prepared as described in U.S. Pat. No. 4,469,863, which disclosure 

15 is hereby incorporated by reference in its entirety. 3 '-Deoxy-3 -methylene phosphonate oligonucleotides 
may be prepared as described in U.S. Pat. Nos. 5,610,289 or 5,625,050 which disclosures are hereby 
incorporated by reference in their entireties. Phosphoramidite oligonucleotides may be prepared as 
described in U.S. Pat. No. 5,256,775 or U.S. Pat. No. 5,366,878 which disclosures are hereby 
incorporated by reference in their entireties. Alkylphosphonothioate oligonucleotides may be prepared as 

20 described in published PCT applications WO 94/17093 and WO 94/02499 which disclosures are hereby 
incorporated by reference in their entireties. 3 '-Deoxy-3 '-amino phosphoramidate oligonucleotides may 
be prepared as described in U.S. Pat. No. 5,476,925, which disclosure is hereby incorporated by reference 
in its entirety. Phosphotriester oligonucleotides may be prepared as described in U.S. Pat. No. 5,023,243, 
which disclosure is hereby incorporated by reference in its entirety. Borano phosphate oligonucleotides 

25 may be prepared as described in U.S. Pat. Nos. 5,130,302 and 5,177,198 which disclosures are hereby 
incorporated by reference in their entireties. 

The term " upstream " is used herein to refer to a location which is toward the 5' end of the 
polynucleotide from a specific reference point. 

The terms " base paired " and " Watson & Crick base paired " are used interchangeably herein to 

30 refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence identities in 
a manner like that found in double-helical DNA with thymine or uracil residues linked to adenine residues 
by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds (See Stryer, 
1995, which disclosure is hereby incorporated by reference in its entirety). 
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The terms " complementary " or " complement thereof are used herein to refer to the sequences of 
polynucleotides which is capable of forming Watson & Crick base pairing with another specified 
polynucleotide throughout the entirety of the complementary region. For the purpose of the present 
invention, a first polynucleotide is deemed to be complementary to a second polynucleotide when each 
5 base in the first polynucleotide is paired with its complementary base. Complementary bases are, 
generally, A and T (or A and U), or C and G. "Complement" is used herein as a synonym from 
"complementary polynucleotide", "complementary nucleic acid" and "complementary nucleotide 
sequence". These terms are applied to pairs of polynucleotides based solely upon their sequences and not 
any particular set of conditions under which the two polynucleotides would actually bind. Unless 

10 otherwise stated, all complementary polynucleotides are fully complementary on the whole length of the 
considered polynucleotide. 

The terms " polypeptide " and " protein ", used interchangeably herein, refer to a polymer of amino 
acids without regard to the length of the polymer; thus, peptides, oligopeptides, and proteins are included 
within the definition of polypeptide. This term also does not specify or exclude chemical or post- 

15 expression modifications of the polypeptides of the invention, although chemical or post-expression 
modifications of these polypeptides may be included excluded as specific embodiments. Therefore, for 
example, modifications to polypeptides that include the covalent attachment of glycosyl groups, acetyl 
groups, phosphate groups, lipid groups and the like are expressly encompassed by the term polypeptide. 
Further, polypeptides with these modifications may be specified as individual species to be included or 

20 excluded from the present invention. The natural or other chemical modifications, such as those listed in 
examples above can occur anywhere in a polypeptide, including the peptide backbone, the amino acid 
side-chains and the amino or carboxyl termini. It will be appreciated that the same type of modification 
may be present in the same or varying degrees at several sites in a given polypeptide. Also, a given 
polypeptide may contain many types of modifications. Polypeptides may be branched, for example, as a 

25 result of ubiquitination, and they may be cyclic, with or without branching. Modifications include 
acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of 
a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a 
lipid or lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide 
bond formation, demethylation, formation of covalent cross-links, formation of cysteine, formation of 

30 pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, 
iodination, methylation, myristoylation, oxidation, pegylation, proteolytic processing, phosphorylation, 
prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino acids to 
proteins such as arginylation, and ubiquitination. (See, for instance Creighton (1993); Seifter et al. y 
(1990); Rattan et al. 9 (1992)). Also included within the definition are polypeptides which contain one or 
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more analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 
which only occur naturally in an unrelated biological system, modified amino acids from mammalian 
systems, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, 
both naturally occurring and non-naturally occurring. 
5 As used herein, the terms " recombinant polynucleotide " and " polynucleotide construct " are used 

interchangeably to refer to linear or circular, purified or isolated polynucleotides that have been 
artificially designed and which comprise at least two nucleotide sequences that are not found as 
contiguous nucleotide sequences in their initial natural environment. In particular, this terms mean that 
the polynucleotide or cDNA is adjacent to "backbone" nucleic acid to which it is not adjacent in its 

10 natural environment. Additionally, to be "enriched" the cDNAs will represent 5% or more of the number 
of nucleic acid inserts in a population of nucleic acid backbone molecules. Backbone molecules 
according to the present invention include nucleic acids such as expression vectors, self-replicating 
nucleic acids, viruses, integrating nucleic acids, and other vectors or nucleic acids used to maintain or 
manipulate a nucleic acid insert of interest. Preferably, the enriched cDNAs represent 15% or more of the 

15 number of nucleic acid inserts in the population of recombinant backbone molecules. More preferably, 
the enriched cDNAs represent 50% or more of the number of nucleic acid inserts in the population of 
recombinant backbone molecules. In a highly preferred embodiment, the enriched cDNAs represent 90% 
or more (including any number between 90 and 100%, to the thousandth position, e.g., 99.5%) # of the 
number of nucleic acid inserts in the population of recombinant backbone molecules. 

20 The term " recombinant polypeptide " is used herein to refer to polypeptides that have been 

artificially designed and which comprise at least two polypeptide sequences that are not found as 
contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides which 
have been expressed from a recombinant polynucleotide. 

As used herein, the term " operably linked " refers to a linkage of polynucleotide elements in a 

25 functional relationship. A sequence which is "operably linked" to a regulatory sequence such as a 
promoter means that said regulatory element is in the correct location and orientation in relation to the 
nucleic acid to control RNA polymerase initiation and expression of the nucleic acid of interest. For 
instance, a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of 
the coding sequence. 

30 As used herein, the term " non-human animal " refers to any non-human animal, including insects, 

birds, rodents and more usually mammals. Preferred non-human animals include: primates; farm animals 
such as swine, goats, sheep, donkeys, cattle, horses, chickens, rabbits; and rodents, preferably rats or 
mice. As used herein, the term " animal " is used to refer to any species in the animal kingdom, preferably 
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vertebrates, including birds and fish, and more preferable a mammal. Both the terms "animal" and 
"mammal" expressly embrace human subjects unless preceded with the term "non-human". 

The term " domain " refers to an amino acid fragment with specific biological properties. This 
term encompasses all known structural and linear biological motifs. Examples of such motifs include but 
5 are not limited to leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha 
helices, and beta sheets, signal peptides which direct the secretion of proteins, sites for post-translational 
modification, enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 

Although they have distinct meanings, the terms " comprising ", " consisting of * and " consisting 
essentially of ' may be interchanged for one another throughout the instant application". The term 
10 "having" has the same meaning as "comprising" and may be replaced with either the term "consisting of 
or "consisting essentially of. 

An "amplification product" refers to a product of any amplification reaction, e.g. PCR, RT-PCR, 
LCR, etc. 

A "modulator" of a protein or other compound refers to any agent that has a functional effect on 
15 the protein, including physical binding to the protein, alterations of the quantity or quality of expression 
of the protein, altering any measurable or detectable activity, property, or behavior of the protein, or in 
any way interacts with the protein or compound. 

"A test compound" can be any molecule that is evaluated for its ability to modulate a protein or 
other compound. 

20 Unless otherwise specified in the application, nucleotides and amino acids of polynucleotides and 

polypeptides respectively of the present invention are contiguous and not interrupted by heterologous 
sequences. 

Identity Between Nucleic Acids Or Polypeptides 

The terms " percentage of sequence identity " and " percentage homology " are used 
25 interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are 
determined by comparing two optimally aligned sequences over a comparison window, wherein the 
portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions 
or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or 
deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the 
30 number of positions at which the identical nucleic acid base or amino acid residue occurs in both 
sequences to yield the number of matched positions, dividing the number of matched positions by the 
total number of positions in the window of comparison and multiplying the result by 100 to yield the 
percentage of sequence identity. Homology is evaluated using any of the variety of sequence comparison 
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algorithms and programs known in the art. Such algorithms and programs include, but are by no means 
limited to, TBLASTN, BLASTP, FASTA, TFASTA, CLUSTALW, FASTDB (Pearson and Lipman, 
1988; Altschul a/. s 1990; Thompson et al, 1994; Higgins et al, 1996; Altschul etal, 1990; Altschul 
et al, 1993; Brutlag et al, 1990), the disclosures of which are incorporated by reference in their entireties. 
5 In a particularly preferred embodiment, protein and nucleic acid sequence homologies are 

evaluated using the Basic Local Alignment Search Tool ("BLAST") which is well known in the art (see, 
e.g., Karlin and Altschul, 1990; Altschul et al, 1990, 1993, 1997), the disclosures of which are 
incorporated by reference in their entireties. In particular, five specific BLAST programs are used to 
perform the following task: 

10 (1) BLASTP and BLAST3 compare an amino acid query sequence against a protein sequence 

database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual translation products of a query nucleotide 
sequence (both strands) against a protein sequence database; 

15 (4) TBLASTN compares a query protein sequence against a nucleotide sequence database 

translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence against the 
six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, which are 

20 referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid sequence and a 
test sequence which is preferably obtained from a protein or nucleic acid sequence database. High- 
scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring matrix, many of 
which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 matrix (Gonnet et al, 
1992; Henikoff and Henikoff, 1993), the disclosures of which are incorporated by reference in their 

25 entireties. Less preferably, the PAM or PAM250 matrices may also be used (see, e.g., Schwartz and 
Dayhoff, eds., 1978), the disclosure of which is incorporated by reference in its entirety. The BLAST 
programs evaluate the statistical significance of all high-scoring segment pairs identified, and preferably 
selects those segments which satisfy a user-specified threshold of significance, such as a user-specified 
percent homology. Preferably, the statistical significance of a high-scoring segment pair is evaluated 

30 using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 1990), the disclosure of 
which is incorporated by reference in its entirety. The BLAST programs may be used with the default 
parameters or with modified parameters provided by the user. 

Another preferred method for determining the best overall match between a query nucleotide 
sequence (a sequence of the present invention) and a subject sequence, also referred to as a global 
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sequence alignment, can be determined using the FASTDB computer program based on the algorithm of 
Brutlag et aL (1990), the disclosure of which is incorporated by reference in its entirety. In a sequence 
alignment the query and subject sequences are both DNA sequences. An RNA sequence can be compared 
by first converting U's to Ts. The result of said global sequence alignment is in percent identity. Preferred 
5 parameters used in a FASTDB alignment of DNA sequences to calculate percent identity are: 
Matrix=Unitary, k-tuple=4, Mismatch Penalty= 1, Joining Penalty=30, Randomization Group Length=0, 
Cutoff Scores 1, Gap Penalty=5, Gap Size Penalty 0.05, Window Size=500 or the length of the subject 
nucleotide sequence, whichever is 35 shorter. If the subject sequence is shorter than the query sequence 
because of 5' or 3' deletions, not because of internal deletions, a manual correction must be made to the 

10 results. This is because the FASTDB program does not account for 5' and 3' truncations of the subject 
sequence when calculating percent identity. For subject sequences truncated at the 5' or 3'ends, relative to 
the query sequence, the percent identity is corrected by calculating the number of bases of the query 
sequence that are 5 f and 3' of the subject sequence, which are not matched/aligned, as a percent of the 
total bases of the query sequence. Whether a nucleotide is matched/aligned is determined by results of the 

15 FASTDB sequence alignment. This percentage is then subtracted from the percent identity, calculated by 
the above FASTDB program using 10, the specified parameters, to arrive at a final percent identity score. 
This corrected score is what is used for the purposes of the present invention. Only nucleotides outside the 
5' and 3 1 nucleotides of the subject sequence, as displayed by the FASTDB alignment, which are not 
matched/aligned with the query sequence, are calculated for the purposes of manually adjusting the 

20 percent identity score. For example, a 90 nucleotide subject sequence is aligned to a 100 nucleotide query 
sequence to determine percent identity. The deletions occur at the 5' end of the subject sequence and 
therefore, the FASTDB alignment does not show a matched/alignment of the first 10 nucleotides at 5' 
end. The 10 unpaired nucleotides represent 10% of the sequence (number of nucleotides at the 5' and 3* 
ends not matched/total number of nucleotides in the query sequence) so 10% is subtracted from the 

25 percent identity score calculated by the FASTDB program. If the remaining 90 nucleotides were perfectly 
matched the final percent identity would be 90%. In another example, a 90 nucleotide subject sequence is 
compared with a 100 nucleotide query sequence. This time the deletions are internal deletions so that 
there are no nucleotides on the 5' or 3 f of the subject sequence which are not matched/aligned with the 
query. In this case the percent identity calculated by FASTDB is not manually corrected. Once again, 

30 only nucleotides 5' and 3' of the subject sequence which are not matched/aligned with the query sequence 
are manually corrected. No other manual corrections are made for the purposes of the present invention. 

Another preferred method for determining the best overall match between a query amino acid 
sequence (a sequence of the present invention) and a subject sequence, also referred to as a global 
sequence alignment, can be determined using the FASTDB computer program based on the algorithm of 
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Brutlag et al. (1990). In a sequence alignment the query and subject sequences are both amino acid 
sequences. The result of said global sequence alignment is in percent identity. Preferred parameters used 
in a FASTDB amino acid alignment are: Matrix=PAM 0, k-tuple=2, Mismatch Penalty= 1, Joining 
Penalty=20, Randomization Group25Length=0, Cutoff Score= 1, Window Size=sequence length, Gap 
5 Penalty=5, Gap Size Penalty=0.05, Window Size=500 or the length of the subject amino acid sequence, 
whichever is shorter. If the subject sequence is shorter than the query sequence due to N-or C-terminal 
deletions, not because of internal deletions, the results, in percent identity, must be manually corrected. 
This is because the FASTDB program does not account for N- and C-terminal truncations of the subject 
sequence when calculating global percent identity. For subject sequences truncated at the N- and C- 

10 termini, relative to the query sequence, the percent identity is corrected by calculating the number of 
residues of the query sequence that are N- and C- terminal of the subject sequence, which are not 
matched/aligned with a corresponding subject residue, as a percent of the total bases of the query 
sequence. Whether a residue is matched/aligned is determined by results of the FASTDB sequence 
alignment. This percentage is then subtracted from the percent identity, calculated by the above FASTDB 

15 program using the specified parameters, to arrive at a final percent identity score. This final percent 
identity score is what is used for the purposes of the present invention. Only residues to the N- and C- 
termini of the subject sequence, which are not matched/aligned with the query sequence, are considered 
for the purposes of manually adjusting the percent identity score. That is, only query amino acid residues 
outside the farthest N- and C-terminal residues of the subject sequence. For example, a 90 amino acid 

20 residue subject sequence is aligned with a 100-residue query sequence to determine percent identity. The 
deletion occurs at the N-terminus of the subject sequence and therefore, the FASTDB alignment does not 
match/align with the first residues at the N-terminus. The 10 unpaired residues represent 10% of the 
sequence (number of residues at the N- and C- termini not matched/total number of residues in the query 
sequence) so 10% is subtracted from the percent identity score calculated by the FASTDB program. If the 

25 remaining 90 residues were perfectly matched the final percent identity would be 90%. In another 
example, a 90-residue subject sequence is compared with a 100-residue query sequence. This time the 
deletions are internal so there are no residues at the N- or C-termini of the subject sequence, which are not 
matched/aligned with the query. In this case the percent identity calculated by FASTDB is not manually 
corrected. Once again, only residue positions outside the N- and C-terminal ends of the subject sequence, 

30 as displayed in the FASTDB alignment, which are not matched/aligned with the query sequence are 
manually corrected. No other manual corrections are made for the purposes of the present invention. 

The term " percentage of sequence similarity " refers to comparisons between polypeptide 
sequences and is determined by comparing two optimally aligned sequences over a comparison window, 
wherein the portion of the polypeptide sequence in the comparison window may comprise additions or 
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deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or 
deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the 
number of positions at which an identical or equivalent amino acid residue occurs in both sequences to 
yield the number of matched positions, dividing the number of matched positions by the total number of 
5 positions in the window of comparison and multiplying the result by 100 to yield the percentage of 
sequence similarity. Similarity is evaluated using any of the variety of sequence comparison algorithms 
and programs known in the art, including those described above in this section. Equivalent amino acid 
residues are defined herein in the "Mutated polypeptides" section. 

Polynucleotides of the invention 

10 The present invention concerns GENSET genomic and cDNA sequences. The present invention 

encompasses GENSET genes, polynucleotides comprising GENSET genomic and cDNA sequences, as 
well as fragments and variants thereof These polynucleotides may be purified, isolated, or recombinant. 

Also encompassed by the present invention are allelic variants, orthologs, splice variants, and/or 
species homologues of the GENSET genes. Procedures known in the art can be used to obtain full-length 

15 genes and cDNAs, allelic variants, splice variants, full-length coding portions, orthologs, and/or species 
homologues of genes and cDNAs corresponding to a nucleotide sequence selected from the group 
consisting of sequences of SEQ ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool, 
using information from the sequences disclosed herein or the clone pool deposited with the ATCC. For 
example, allelic variants, orthologs and/or species homologues may be isolated and identified by making 

20 suitable probes or primers from the sequences provided herein and screening a suitable nucleic acid 
source for allelic variants and/or the desired homologue using any technique known to those skilled in the 
art including those described into the section entitled "To find similar sequences". 

In a specific embodiment, the polynucleotides of the invention are at least 15, 30, 50, 100, 125, 
500, or 1000 continuous nucleotides. In another embodiment, the polynucleotides are less than or equal to 

25 300kb, 200kb, lOOkb, 50kb, lOkb, 7.5kb, 5kb, 2.5kb, 2kb, 1.5kb, or lkb in length. In a further 
embodiment, polynucleotides of the invention comprise a portion of the coding sequences, as disclosed 
herein, but do not comprise all or a portion of any intron. In another embodiment, the polynucleotides 
comprising coding sequences do not contain coding sequences of a genomic flanking gene (i.e., 5' or 3' to 
the gene of interest in the genome). In other embodiments, the polynucleotides of the invention do not 

30 contain the coding sequence of more than 1000, 500, 250, 100, 75, 50, 25, 20, 15, 10, 5, 4, 3, 2, or 1 
naturally occurring genomic flanking gene(s). 
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Deposited clone pool of the invention 

Expression of GENSET genes has been shown to lead to the production of at least one mRNA 
species per GENSET gene, which cDNA sequence is set forth in the appended sequence listing as SEQ 
ID Nos: 1-241. The cDNAs (SEQ ID Nos: 1-241) corresponding to these GENSET mRNA species were 
5 cloned in the vector pBluescriptll SK" (Stratagene) or one of its derivative called pPT (see figure 1). 
Cells containing the cloned cDNAs of the present invention are maintained in permanent deposit by the 
inventors at Genset, S.A., 24 Rue Royale, 75008 Paris, France. Table I provides the applicant's internal 
designation number (column entitled "Internal designation") assigned to each sequence identification 
number of SEQ ID Nos: 1-482 (column entitled "Seq Id No") and indicates whether the sequence is a 

10 nucleic acid sequence or a polypeptide sequence (column entitled "Type"), and in which vector the cDNA 
was cloned (column entitled "Vector"). 

Each cDNA can be removed from the Bluescript vector in which it was inserted by performing a 
NotI Pst I double digestion to produce the appropriate fragment for each clone provided the cDNA 
sequence of interest does not contain this restriction site within its sequence. The preferable sites for 

15 cDNA removal for those clones inserted into pPT are Muni and Hindlll, the sites used for cloning 
provided the cDNA sequence of interest does not contain this restriction site within its sequence. 
Alternatively, other restriction enzymes of the multicloning site of the vector may be used to recover the 
desired insert as indicated by the manufacturer or in figure 1 . 

Pool of cells containing the cDNAs of the invention, from which the cells containing a particular 

20 polynucleotide is obtainable, were also deposited with the American Tissue Culture Collection (ATCC), 
10801 University Boulevard, Manassas, VA 20110-2209, United States . Each cDNA clone has been 
transfected into separate bacterial cells (E-coli) for these composite deposits. In particular, cells 
containing the sequences of SEQ ID Nos: 1-241 were deposited on January, 21, 2000 in the pool having 
ATCC Accession No. PTA-1218 and designated GENSET.071PRF. 

25 Bacterial cells containing a particular clone can be obtained from the composite deposit as 

follows: 

An oligonucleotide probe or probes should be designed to the sequence that is known for that 
particular clone. This sequence can be derived from the sequences provided herein, or from a 
combination of those sequences. The design of the oligonucleotide probe should preferably follow these 
30 parameters: 

(a) It should be designed to an area of the sequence which has the fewest ambiguous bases 
("N's"), if any; 
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(b) Preferably, the probe is designed to have a Tm of approximately 80 degree Celsius (assuming 
2 degrees for each A or T and 4 degrees for each G or C). However, probes having melting temperatures 
between 40 degree Celsius and 80 degree Celsius may also be used provided that specificity is not lost. 

The oligonucleotide should preferably be labeled with gamma[ 32 P]ATP (specific activity 6000 
5 Ci/mmole) and T4 polynucleotide kinase using commonly employed techniques for labeling 
oligonucleotides. Other labeling techniques can also be used. Unincorporated label should preferably be 
removed by gel filtration chromatography or other established methods. The amount of radioactivity 
incorporated into the probe should be quantified by measurement in a scintillation counter. Preferably, 
specific activity of the resulting probe should be approximately 4xl0 6 dpm/pmole. 

10 The bacterial culture containing the pool of full-length clones should preferably be thawed and 

100 ul of the stock used to inoculate a sterile culture flask containing 25 ml of sterile L-broth containing 
ampicillin at 100 ug/ml. The culture should preferably be grown to saturation at 37 degree Celsius, and 
the saturated culture should preferably be diluted in fresh L-broth. Aliquots of these dilutions should 
preferably be plated to determine the dilution and volume which will yield approximately 5000 distinct 

15 and well-separated colonies on solid bacteriological media containing L-broth containing ampicillin at 
100 ug/ml and agar at 1.5% in a 150 mm petri dish when grown overnight at 37 degree Celsius. Other 
known methods of obtaining distinct, well-separated colonies can also be employed. 

Standard colony hybridization procedures should then be used to transfer the colonies to 
nitrocellulose filters and lyse, denature and bake them. 

20 The filter is then preferably incubated at 65 degree Celsius for 1 hour with gentle agitation in 

6XSSC (20X stock is 175.3 g NaCl/liter, 88.2 g Na citrate/liter, adjusted to pH 7.0 with NaOH) 
containing 0.5% SDS, lOOpg/ml of yeast RNA, and lOmM EDTA (approximately 10 ml per 150 mm 
filter). Preferably, the probe is then added to the hybridization mix at a concentration greater than or 
equal to lxlO 6 dpm/ml. The filter is then preferably incubated at 65 degree Celsius with gentle agitation 

25 overnight. The filter is then preferably washed in 500 ml of 2X SSC/0.1% SDS at room temperature with 
gentle shaking for 15 minutes. A third wash with 0.1XSSC/0.5% SDS at 65 degree Celsius for 
30 minutes to 1 hour is optional. The filter is then preferably dried and subjected to autoradiography for 
sufficient time to visualize the positives on the X-ray film. Other known hybridization methods can also 
be employed. 

30 The positive colonies are picked, grown in culture, and plasmid DNA isolated using standard 

procedures. The clones can then be verified by restriction analysis, hybridization analysis, or DNA 
sequencing. The plasmid DNA obtained using these procedures may then be manipulated using standard 
cloning techniques familiar to those skilled in the art. 
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Alternatively, to recover cDNA inserts from the pool of bacteria, a PCR can be performed on 
plasmid- DNA isolated using standard procedures and primers designed at both ends of the cDNA 
insertion, including primers designed in the multicloning site of the vector . For example, a PCR reaction 
may be conducted using universal primers designed by the plasmid provider or using primers which are 
5 specific to the cDNA of interest In the case of Bluescript SK(-), a PCR reaction may be conducted using a 
primer having the sequence GGAAACAGCTATGACCA and a primer having the sequence 
GTAAAACGACGGCCAGT. This will produce a DNA fragment including a piece of the multiple 
cloning site and the cDNA insert. If a specific cDNA of interest is to be recovered, primers may be 
designed in order to be specific for the 5' end and the 3' end of this cDNA using sequence information 

10 available from the appended sequence listing. The PCR product which corresponds to the cDNA of 
interest can then be manipulated using standard cloning techniques familiar to those skilled in the art. 

Therefore, an object of the invention is an isolated, purified, or recombinant polynucleotide 
comprising a nucleotide sequence selected from the group consisting of cDNA inserts of the deposited 
clone pool. Moreover, preferred polynucleotides of the invention include purified, isolated, or 

15 recombinant GENSET cDNAs consisting of, consisting essentially of, or comprising a nucleotide 
sequence selected from the group consisting of cDNA inserts of the deposited clone pool. 

The polynucleotides of SEQ ID NOs: 1-141 may be interchanged with the corresponding 
polynucleotides encoded by the human cDNA of the clones inserts of the deposited clone pool. The 
polypeptides of SEQ ID NOs: 242-482 may be interchanged with the corresponding polypeptides 

20 encoded by the human cDNA of the clones inserts of the deposited clone pool. The correspondance 
between the polynucleotides of SEQ ID Nos: 1-141, the polypeptides of SEQ ED NOs: 242-482 and 
clones inserts of the deposited clone pool is given in Table I.. 

cDNA sequences of the invention 

Another object of the invention is a purified, isolated, or recombinant polynucleotide comprising 
25 a nucleotide sequence selected from the group consisting of sequences of SEQ ID Nos: 1-241, 
complementary sequences thereto, and fragments thereof. Moreover, preferred polynucleotides of the 
invention include purified, isolated, or recombinant GENSET cDNAs consisting of, consisting essentially 
of, or comprising a sequence selected from the group consisting of SEQ ID Nos: 1-241. 

Polynucleotides GENSET sequences of SEQ ID Nos: 1-241 were then searched for open reading 
30 frames able to encode polypeptides. The GENSET ORFs were also searched to identify potential signal 
sequence motifs using slight modifications of the procedures disclosed in Von Heijne, Nucleic Acids Res. 
14:4683-4690, 1986, as described in PCT publication WO 00/37491, the entire disclosures of which are 
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incorporated herein by reference. The GENSET cDNAs of SEQ ID Nos: 1-31 and 33-143 encoding 
polypeptides of SEQ ID Nos: 242-272 and 274-384 were thus found as containing such signal sequences. 

Structural parameters of each of the cDNA of the present invention are described in Table II. 
Namely, Table II provides, for each cDNA of SEQ ID Nos: 1-241 referred to by its sequence 
5 identification number (column entitled "Seq Id No"), the locations of the first and last nucleotides of the 
coding sequences (listed under the heading "Full Coding Sequence"), and, if applicable, the locations of 
the signal sequence and the sequence encoding the mature polypeptide in the case of secreted proteins 
(SEQ ID Nos: 1-31 and 33-143) listed under the headings "Signal Sequence" and "Coding Sequence for 
the mature Protein" respectively, the locations of the first and last nucleotides of the polyA signals (listed 

10 under the heading "Poly A Signal") and the locations of the first and last nucleotides of the polyA sites 
(listed under the heading "Poly A Site"). 

Accordingly, the full coding sequence (CDS) or open reading frame (ORF) of each cDNA of the 
invention refers to the nucleotide sequence beginning with the first nucleotide of the start codon and 
ending with the last nucleotide of the stop codon (see column entiled "Full coding sequence" of Table II 

15 for sequences of Seq Id Nos: 1-241). Similarly, the signal sequence of each cDNA of the invention refers 
to the nucleotide sequence beginning with the first nucleotide of the start codon and ending with the last 
nucleotide of the codon encoding the signal peptide (see column entiled "Signal sequence" of Table II for 
sequences of Seq Id Nos: 1-31 and 33-143) and the coding sequence for the mature polypeptide of each 
cDNA of the invention refers to the nucleotide sequence beginning with the first nucleotide of the first 

20 codon encoding and ending with the last nucleotide of the stop codon (see column entiled "Coding 
sequence for mature protein" of Table II for sequences of Seq Id Nos: 1-31 and 33-143). Similarly, the 
5 'untranslated region (or 5'UTR) of each cDNA of the invention refers to the nucleotide sequence 
starting at nucleotide 1 and ending at the nucleotide immediately 5' to the first nucleotide of the start 
codon. The 3 'untranslated region (or 3'UTR) of each cDNA of the invention refers to the nucleotide 

25 sequence starting at the nucleotide immediately 3' to the last nucleotide of the stop codon and ending at 
the last nucleotide of the cDNA. 

Untranslated regions 

In addition, the invention concerns a purified, isolated, and recombinant nucleic acid comprising a 
nucleotide sequence selected from the group consisting of the 5 ? UTRs of sequences of SEQ ID Nos: 1- 
30 241 and sequences of clone inserts of the deposited clone pool, sequences complementary thereto, and 
allelic variants thereof. The invention also concerns a purified, isolated, and recombinant nucleic acid 
comprising a nucleotide sequence selected from the group consisting of the 3'UTRs of sequences of SEQ 
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ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool, sequences complementary 
thereto, and allelic variants thereof. 

These polynucleotides may be used to detect the presence of GENSET mRNA species in a 
biological sample using either hybridization or RT-PCR techniques well known to those skilled in the art 
5 those skilled in the art. 

In addition, these polynucleotides may be used as regulatory molecules able to affect the 
processing and maturation of the polynucleotide including them (either a GENSET polynucleotide or an 
heterologous polynucleotide), preferably the localization, stability and/or translation of said 
polynucleotide including them (for a review on UTRs see Decker and Parker, 1995, Derrigo et aL, 2000). 
10 In particular, 3'UTRs may be used in order to control the stability of heterologous mRNAs in 
recombinant vectors using any methods known to those skilled in the art including Makrides (1999), US 
Patents 5,925,56; 5,807,7 and 5,756,264, which disclosures are hereby incorporated by reference in their 
entireties. 

Coding sequences 

15 Another object of the invention is an isolated, purified or recombinant polynucleotide comprising 

the full coding sequence of a sequence selected from the group consisting of sequences of SEQ ID Nos: 
1-241, clone inserts of the deposited clone pool, and variants thereof. 

A further object of the invention is an isolated, purified or recombinant polynucleotide encoding a 
polypeptide comprising a sequence selected from the group consisting of sequences of SEQ ED Nos: 242- 

20 482 and allelic variants thereof. Another object of the invention is an isolated, purified or recombinant 
polynucleotide encoding a polypeptide comprising a sequence selected from the group consisting of 
polypeptides encoded by cDNA inserts of the deposited clone pool and allelic variants thereof. 

In a preferred embodiment, the invention encompasses an isolated, purified or recombinant 
polynucleotide encoding a polypeptide comprising a sequence selected from the group consisting of the 

25 mature proteins of SEQ ID Nos: 242-272 and 274-384. In another preferred embodiment, the invention 
encompasses an isolated, purified or recombinant polynucleotide encoding a polypeptide comprising a 
sequence selected from the group consisting of the signal peptides of SEQ ID Nos: 242-272 and 274-384. 

It will be appreciated that should the extent of the full coding sequence differ from that indicated 
in the appended sequence listing as a result of a sequencing error, reverse transcription or amplification 

30 error, mRNA splicing, post-translational modification of the encoded protein, enzymatic cleavage of the 
encoded protein, or other biological factors, one skilled in the art would be readily able to identify the 
extent of the full coding sequences in the sequences of SEQ ID Nos: 1-241. Accordingly, the scope of 
any claims herein relating to nucleic acids containing the full coding sequence of one of SEQ ID Nos: 1~ 
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241 is not to be construed as excluding any readily identifiable variations from or equivalents to the full 
coding sequences described in the appended sequence listing. Similarly, should the extent of the 
polypeptides differ from those indicated in the appended sequence listing as a result of any of the 
preceding factors, the scope of claims relating to polypeptides comprising the amino acid sequence of the 
5 polypeptides of SEQ ID Nos: 242-482 is not to be construed as excluding any readily identifiable 
variations from or equivalents to the sequences described in the appended sequence listing. 

It will be appreciated that should the extent of the coding sequence of the mature protein differ 
from that indicated in the appended sequence listing as a result of a sequencing error, reverse transcription 
or amplification error, mRNA splicing, post-translational modification of the encoded protein, enzymatic 

10 cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to 
identify the extent of the coding sequences for the mature protein in the sequences of SEQ ID Nos: 1-31 
and 33-143. Accordingly, the scope of any claims herein relating to nucleic acids containing the coding 
sequence for the mature proteins of one of SEQ ID Nos: 1-31 and 33-143 is not to be construed as 
excluding any readily identifiable variations from or equivalents to the coding sequences described in the 

15 appended sequence listing. Similarly, should the extent of the mature polypeptides differ from those 
indicated in the appended sequence listing as a result of any of the preceding factors, the scope of claims 
relating to mature polypeptides comprising the amino acid sequence of the polypeptides of SEQ ED Nos: 
242-272 and 274-384 is not to be construed as excluding any readily identifiable variations from or 
equivalents to the sequences described in the appended sequence listing. 

20 It will be appreciated that should the extent of the coding sequence of the signal peptide differ 

from that indicated in the appended sequence listing as a result of a sequencing error, reverse transcription 
or amplification error, mRNA splicing, post-translational modification of the encoded protein, enzymatic 
cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to 
identify the extent of the coding sequences for the signal peptide in the sequences of SEQ ID Nos: 1-31 

25 and 33-143. Accordingly, the scope of any claims herein relating to nucleic acids containing the signal 
sequence of one of SEQ ID Nos: 1-31 and 33-143 is not to be construed as excluding any readily 
identifiable variations from or equivalents to the coding sequences described in the appended sequence 
listing. Similarly, should the extent of the signal peptides differ from those indicated in the appended 
sequence listing as a result of any of the preceding factors, the scope of claims relating to signal peptides 

30 comprising the amino acid sequence of the polypeptides of SEQ ID Nos: 242-272 and 274-384 is not to 
be construed as excluding any readily identifiable variations from or equivalents to the sequences 
described in the appended sequence listing. 

The above disclosed polynucleotides that contains the coding sequence (for the full-length protein 
of for the mature protein) of the GENSET genes may be expressed in a desired host cell or a desired host 
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organism, when this polynucleotide is placed under the control of suitable expression signals. The 
expression signals may be either the expression signals contained in the regulatory regions in the 
GENSET genes of the invention or in contrast the signals may be exogenous regulatory nucleic 
sequences. Such a polynucleotide, when placed under the suitable expression signals, may also be 
5 inserted in a vector for its expression and/or amplification. 

Further included in the present invention are polynucleotides encoding the polypeptides of the 
present invention that are fused in frame to the coding sequences for additional heterologous amino acid 
sequences. Of special interest are polynucleotides comprising GENSET signal sequences fused to an 
heterologous polypeptide as described in the section entitled "Secretion vectors". Also included in the 

10 present invention are nucleic acids encoding polypeptides of the present invention together with 
additional, non-coding sequences, including for example, but not limited to non-coding 5' and 3' 
sequences, vector sequence, sequences used for purification, probing, or priming. For example, 
heterologous sequences include transcribed, untranslated sequences that may play a role in transcription, 
and mRNA processing, for example, ribosome binding and stability of mRNA. The heterologous 

15 sequences may alternatively comprise additional coding sequences that provide additional functionalities. 
Thus, a nucleotide sequence encoding a polypeptide may be fused to a tag sequence, such as a sequence 
encoding a peptide that facilitates purification of the fused polypeptide. In certain preferred embodiments 
of this aspect of the invention, the tag amino acid sequence is a hexa-histidine peptide, such as the tag 
provided in a pQE vector (QIAGEN), among others, many of which are commercially available. For 

20 instance, hexa-histidine provides for convenient purification of the fusion protein (See Gentz et aL, 1989), 
the disclosure of which is incorporated by reference in its entirety. The "HA" tag is another peptide 
useful for purification which corresponds to an epitope derived from the influenza hemagglutinin protein 
(See Wilson et al., 1984), the disclosure of which is incorporated by reference in its entirety. As 
discussed below other such fusion proteins include the GENSET protein fused to Fc at the N- or 

25 C-terminus. 

Suitable recombinant vectors that contain a polynucleotide such as described herein are disclosed 
elsewhere in the specification. Expression vectors encoding GENSET polypeptides or fragments thereof 
are described in the section entitled "Preparation of the polypeptides". 

Regulatory sequences of the invention 

30 As mentioned, the genomic sequence of GENSET genes contains regulatory sequences in the 

non-coding 5 '-flanking region and possibly in the non-coding 3 '-flanking region that border the GENSET 
coding regions containing the exons of these genes. 
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Polynucleotides derived from GENSET 5' and 3' regulatory regions are useful in order to detect 
the presence of at least a copy of a genomic nucleotide sequence of the GENSET gene or a fragment 
thereof in a test sample. 

Preferred regulatory sequences 

5 Polynucleotides carrying the regulatory elements located at the 5' end and at the 3' end of 

GENSET coding regions may be advantageously used to control the transcriptional and translational 
activity of a heterologous polynucleotide of interest. 

Thus, the present invention also concerns a purified or isolated nucleic acid comprising a 
polynucleotide which is selected from the group consisting of the 5 ? and V GENSET regulatory regions, 

10 sequences complementary thereto, regulatory active fragments and variants thereof. The invention also 
pertains to a purified or isolated nucleic acid comprising a polynucleotide having at least 95% nucleotide 
identity with a polynucleotide selected from the group consisting of GENSET 5' and 3 5 regulatory 
regions, advantageously 99 % nucleotide identity, preferably 99.5% nucleotide identity and most 
preferably 99.8% nucleotide identity with a polynucleotide selected from the group consisting of 

15 GENSET 5' and 3' regulatory regions, sequences complementary thereto, variants and regulatory active 
fragments thereof. 

Another object of the invention consists of purified, isolated or recombinant nucleic acids 
comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined herein, 
with a polynucleotide selected from the group consisting of the nucleotide sequences of GENSET 5'- and 
20 3' regulatory regions, sequences complementary thereto, variants and regulatory active fragments thereof. 

Preferred fragments of 5' regulatory regions have a length of about 1500 or 1000 nucleotides, 
preferably of about 500 nucleotides, more preferably about 400 nucleotides, even more preferably 300 
nucleotides and most preferably about 200 nucleotides. 

Preferred fragments of 3' regulatory regions are at least 20, 50, 100, 150, 200, 300 or 400 bases in 

25 length. 

"Providing" with respect to, e.g. a biological sample, population of cells, etc. indicates that the 
sample, population of cells, etc. is somehow used in a method or procedure. Significantly, "providing" a 
biological sample or population of cells does not require that the sample or cells are specifically isolated 
or obtained for the purposes of the invention, but can instead refer, for example, to the use of a biological 
30 sample obtained by another individual, for another purpose. 

" Regulatory active " polynucleotide derivatives of the 5' regulatory region are polynucleotides 
comprising or alternatively consisting of a fragment of said polynucleotide which is functional as a 
regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide in a 
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recombinant cell host. It could act either as an enhancer or as a repressor. For the purpose of the 
invention, a nucleic acid or polynucleotide is "functional" as a regulatory region for expressing a 
recombinant polypeptide or a recombinant polynucleotide if said regulatory polynucleotide contains 
nucleotide sequences which contain transcriptional and translational regulatory information, and such 
5 sequences are "operably linked" to nucleotide sequences which encode the desired polypeptide or the 
desired polynucleotide. 

The regulatory polynucleotides of the invention may be prepared from the nucleotide sequence of 
GENSET genomic or cDNA sequence, for example, by cleavage using suitable restriction enzymes, or by 
PCR. The regulatory polynucleotides may also be prepared by digestion of a GENSET gene containing 

10 genomic clone by an exonuclease enzyme, such as Bal31 (Wabiko et aL, 1986), the disclosure of which is 
incorporated by reference in its entirety. These regulatory polynucleotides can also be prepared by 
nucleic acid chemical synthesis, as described elsewhere in the specification. 

The regulatory polynucleotides according to the invention may be part of a recombinant 
expression vector that may be used to express a full coding sequence in a desired host cell or host 

15 organism. The recombinant expression vectors according to the invention are described elsewhere in the 
specification. 

Preferred 5'-regulatory polynucleotide of the invention include 5'-UTRs of GENSET cDNAs, or 
regulatory active fragments or variants thereof. More preferred 5 '-regulatory polynucleotides of the 
invention include sequences selected from the group consisting of 5'-UTRs of sequences of SEQ ID Nos: 
20 1-241, 5'-UTRs of clones inserts of the deposited clone pool, regulatory active fragments and variants 
thereof. 

Preferred 3 '-regulatory polynucleotide of the invention include 3'-UTRs of GENSET cDNAs, or 
regulatory active fragments or variants thereof. More preferred 3 '-regulatory polynucleotides of the 
invention include sequences selected from the group consisting of 3'-UTRs of sequences of SEQ ED Nos: 
25 1-241, 3'-UTRs of clones inserts of the deposited clone pool, regulatory active fragments and variants 
thereof. 

A further object of the invention consists of a purified or isolated nucleic acid comprising: 
a) a polynucleotide comprising a 5' regulatory nucleotide sequence selected from the group 
consisting of: 

30 (i) a nucleotide sequence comprising a polynucleotide of a GENSET 5' regulatory region or a 

complementary sequence thereto; 

(ii) a nucleotide sequence comprising a polynucleotide having at least 95% of nucleotide identity 
with the nucleotide sequence of a GENSET 5' regulatory region or a complementary sequence thereto; 
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(iii) a nucleotide sequence comprising a polynucleotide that hybridizes under stringent 
hybridization conditions with the nucleotide sequence of a GENSET 5' regulatory region or a 
complementary sequence thereto; and 

(iv) a regulatory active fragment or variant of the polynucleotides in (i), (ii) and (iii); 

5 b) a nucleic acid molecule encoding a desired polypeptide or a nucleic acid molecule of interest, 

said nucleic acid molecule is operably linked to the polynucleotide defined in (a); and 

c) optionally, a polynucleotide comprising a 3'- regulatory polynucleotide, preferably a 3'- 
regulatory polynucleotide of a GENSET gene. 

In a specific embodiment, the nucleic acid defined above includes the 5 5 -UTR of a GENSET 
10 cDNA, or a regulatory active fragment or variant thereof. 

In a second specific embodiment, the nucleic acid defined above includes the 3'-UTR of a 
GENSET cDNA, or a regulatory active fragment or variant thereof. 

The regulatory polynucleotide of the 5' regulatory region, or its regulatory active fragments or 
variants, is operably linked at the 5 '-end of the nucleic acid molecule encoding the desired polypeptide or 
15 nucleic acid molecule of interest. 

The regulatory polynucleotide of the 3' regulatory region, or its regulatory active fragments or 
variants, is advantageously operably linked at the 3'-end of the nucleic acid molecule encoding the 
desired polypeptide or nucleic acid molecule of interest. 

The desired polypeptide encoded by the above-described nucleic acid may be of various nature or 
20 origin, encompassing proteins of prokaryotic viral or eukaryotic origin. Among the polypeptides 
expressed under the control of a GENSET regulatory region include bacterial, fungal or viral antigens. 
Also encompassed are eukaryotic proteins such as intracellular proteins, such as "house keeping" 
proteins, membrane -bound proteins, such as mitochondrial membrane-bound proteins and cell surface 
receptors, and secreted proteins such as endogenous mediators such as cytokines. The desired 
25 polypeptide may be an heterologous polypeptide or a GENSET protein, especially a protein with an 
amino acid sequence selected from the group consisting of sequences of SEQ ID Nos: 242-482, fragments 
and variants thereof. 

The desired nucleic acids encoded by the above-described polynucleotide, usually an RNA 
molecule, may be complementary to a desired coding polynucleotide, for example to a GENSET coding 
30 sequence, and thus useful as an antisense polynucleotide. Such a polynucleotide may be included in a 
recombinant expression vector in order to express the desired polypeptide or the desired nucleic acid in 
host cell or in a host organism. Suitable recombinant vectors that contain a polynucleotide such as 
described herein are disclosed elsewhere in the specification. When a polynucleotide sequence has been 
recombinantly introduced into a host cell, the cell is said to be "recombinant" for the polynucleotide. 
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Polynucleotide variants 

The invention also relates to variants of the polynucleotides described herein and fragments 
thereof. " Variants" of polynucleotides, as the term is used herein, are polynucleotides that differ from a 
reference polynucleotide. Generally, differences are limited so that the nucleotide sequences of the 
5 reference and the variant are closely similar overall and, in many regions, identical. The present 
invention encompasses both allelic variants and degenerate variants. 

Examples of variant sequences of polynucleotides of the invention are given in the appended 
sequence listing. Table III lists the sequence identification number of all similar sequences of the 
sequence listing, namely variants. All cDNAS referred to by their sequence identification number on a 
10 given line of the table are thought to be variants of the same GENSET gene. 

Allelic variant 

A variant of a polynucleotide may be a naturally occurring variant such as a naturally occurring 
allelic variant, or it may be a variant that is not known to occur naturally. By an " allelic variant " is 
intended one of several alternate forms of a gene occupying a given locus on a chromosome of an 
15 organism (see Lewin, 1990), the disclosure of which is incorporated by reference in its entirety. Diploid 
organisms may be homozygous or heterozygous for an allelic form. Non-naturally occurring variants of 
the polynucleotide may be made by art-known mutagenesis techniques, including those applied to 
polynucleotides, cells or organisms. 

Degenerate variant 

20 In addition to the isolated polynucleotides of the present invention, and fragments thereof, the 

invention further includes polynucleotides which comprise a sequence substantially different from those 
described above but which, due to the degeneracy of the genetic code, still encode a GENSET 
polypeptide of the present invention. These polynucleotide variants are referred to as " degenerate 
variants " throughout the instant application. That is, all possible polynucleotide sequences that encode 

25 the GENSET polypeptides of the present invention are completed. This includes the genetic code and 
species-specific codon preferences known in the art. Thus, it would be routine for one skilled in the art to 
generate the degenerate variants described above, for instance, to optimize codon expression for a 
particular host (e.g., change codons in the human mRNA to those preferred by other mammalian or 
bacterial host cells). 

30 Nucleotide changes present in a variant polynucleotide may be silent, which means that they do 

not alter the amino acids encoded by the polynucleotide. However, nucleotide changes may also result in 
amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the 
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reference sequence. The substitutions, deletions or additions may involve one or more nucleotides. The 
variants may be altered in coding or non-coding regions or both. Alterations in the coding regions may 
produce conservative or non-conservative amino acid substitutions, deletions or additions. In the context 
of the present invention, preferred embodiments are those in which the polynucleotide variants encode 
5 polypeptides which retain substantially the same biological properties or activities as the GENSET 
protein. More preferred polynucleotide variants are those containing conservative substitutions. 

Similar polynucleotides 

Other embodiments of the present invention is a purified, isolated or recombinant polynucleotide 
which is at least 90%, 95%, 96%, 97%, '98% or 99% identical to a polynucleotide selected from the group 

10 consisting of sequences of SEQ ID Nos: 1-241 and clone inserts of the deposited clone pool. The above 
polynucleotides are included regardless of whether they encode a polypeptide having a GENSET 
biological activity. This is because even where a particular nucleic acid molecule does not encode a 
polypeptide having activity, one of skill in the art would still know how to use the nucleic acid molecule, 
for instance, as a hybridization probe or primer. Uses of the nucleic acid molecules of the present 

15 invention that do not encode a polypeptide having GENSET activity include, inter alia, isolating a 
GENSET gene or allelic variants thereof from a DNA library, and detecting GENSET mRNA expression 
in biological samples, suspected of containing GENSET mRNA or DNA by Northern Blot or PCR 
analysis. 

The present invention is further directed to polynucleotides having sequences at least 50%. 60%, 
20 70%, 80%, 90%, 95%, 96%, 97%, 98% or 99% identity to a polynucleotide selected from the group 
consisting of sequences of SEQ ED Nos: 1-241 and clone inserts of the deposited clone pool, where said 
polynucleotide do, in fact, encode a polypeptide having a GENSET biological activity. Of course, due to 
the degeneracy of the genetic code, one of ordinary skill in the art will immediately recognize that a large 
number of the polynucleotides at least 50%. 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, or 99% 
25 identical to a polynucleotide selected from the group consisting of sequences of SEQ ID Nos: 1-241 and 
clone inserts of the deposited clone pool will encode a polypeptide having biological activity. In fact, 
since degenerate variants of these nucleotide sequences all encode the same polypeptide, this will be clear 
to the skilled artisan even without performing the above described comparison assay. It will be further 
recognized in the art that, for such nucleic acid molecules that are not degenerate variants, a reasonable 
30 number will also encode a polypeptide having biological activity. This is because the skilled artisan is 
fully aware of amino acid substitutions that are either less likely or not likely to significantly effect 
protein function (e.g., replacing one aliphatic amino acid with a second aliphatic amino acid), as further 
described below. By a polynucleotide having a nucleotide sequence at least, for example, 95% "identical" 
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to a reference nucleotide sequence of the present invention, it is intended that the nucleotide sequence of 
the polynucleotide is identical to the reference sequence except that the polynucleotide sequence may 
include up to five point mutations per each 100 nucleotides of the reference nucleotide sequence encoding 
the GENSET polypeptide. In other words, to obtain a polynucleotide having a nucleotide sequence at 
5 least 95% identical to a reference nucleotide sequence, up to 5% of the nucleotides in the reference 
sequence may be deleted, inserted, or substituted with another nucleotide. The query sequence may be an 
entire sequence selected from the group consisting of sequences of SEQ ID Nos: 1-241 and sequences of 
clone inserts of the deposited clone pool, or the ORF (open reading frame) of a polynucleotide sequence 
selected from said group, or any fragment specified as described herein. 

10 Hybridizing Polynucleotides 

In another aspect, the invention provides an isolated or purified nucleic acid molecule comprising 
a polynucleotide which hybridizes under stringent hybridization conditions to any polynucleotide of the 
present invention using any methods known to those skilled in the art including those disclosed herein 
and in particular in the "To find similar sequences" section. Also contemplated are nucleic acid 

15 molecules that hybridize to the polynucleotides of the present invention at lower stringency hybridization 
conditions, preferably at moderate or low stringency conditions as defined herein. Such hybridizing 
polynucleotides may be of at least 15,18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, 1000 or 
2000 nucleotides in length. 

Of particular interest, are the polynucleotides hybridizing to any polynucleotide of the invention 

20 and encoding GENSET polypeptides, particularly GENSET polypeptides exhibiting a GENSET 
biological activity. 

Of course, a polynucleotide which hybridizes only to polyA+ sequences (such as any 3' terminal 
polyA+ tract of a cDNA shown in the sequence listing), or to a 5' complementary stretch of T (or U) 
residues, would not be included in the definition of "polynucleotide," since such a polynucleotide would 
25 hybridize to any nucleic acid molecule containing a poly (A) stretch or the complement thereof (e.g., 
practically any double-stranded cDNA clone generated using oligo dT as a primer). 

Complementary polynucleotides 

The invention further provides isolated nucleic acid molecules having a nucleotide sequence fully 
complementary to any polynucleotide of the invention. The present invention encompasses a purified, 
30 isolated or recombinant polynucleotide having a nucleotide sequence complementary to a sequence 
selected from the group consisting of sequences of SEQ ED Nos: 1-241, sequences of clone inserts of the 
deposited clone pool and fragments thereof. Such isolated molecules, particularly DNA molecules, are 
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useful as probes for gene mapping and for identifying GENSET mRNA in a biological sample, for 
instance, by PCR or Northern blot analysis. 

Polynucleotides fragments 

The present invention is further directed to polynucleotides encoding portions or fragments of the 
5 nucleotide sequences described herein. Uses for the polynucleotide fragments of the present invention 
include probes, primers, molecular weight markers and for expressing the polypeptide fragments of the 
present invention. Fragments include portions of polynucleotides selected from the group consisting of a) 
the sequences of SEQ ID Nos: 1-241, b) the genomic GENSET sequences, c) the polynucleotides 
encoding a polypeptide selected from the group consisting of the sequences of SEQ ID Nos: 242-482, d) 

10 the sequences of clone inserts of the deposited clone pool, and e) the polynucleotides encoding the 
polypeptides encoded by the clone inserts of the deposited clone pool. Particularly included in the present 
invention is a purified or isolated polynucleotide comprising at least 8 consecutive bases of a 
polynucleotide of the present invention. In one aspect of this embodiment, the polynucleotide comprises 
at least 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 800, 1000, 1500, or 

15 2000 consecutive nucleotides of a polynucleotide of the present invention. 

In addition to the above preferred polynucleotide sizes, further preferred sub-genuses of 
polynucleotides comprise at least 8 nucleotides, wherein "at least 8" is defined as any integer between 8 
and the integer representing the 3' most nucleotide position as set forth in the sequence listing or 
elsewhere herein. Further included as preferred polynucleotides of the present invention are 

20 polynucleotide fragments at least 8 nucleotides in length, as described above, that are further specified in 
terms of their 5' and 3' position. The 5' and 3' positions are represented by the position numbers set forth 
in the appended sequence listing. For allelic, degenerate and other variants, position 1 is defined as the 5' 
most nucleotide of the ORF, i.e., the nucleotide "A" of the start codon with the remaining nucleotides 
numbered consecutively. Therefore, every combination of a 5' and 3' nucleotide position that a 

25 polynucleotide fragment of the present invention, at least 8 contiguous nucleotides in length, could 
occupy on a polynucleotide of the invention is included in the invention as an individual species. The 
polynucleotide fragments specified by 5' and 3' positions can be immediately envisaged and are therefore 
not individually listed solely for the purpose of not unnecessarily lengthening the specifications. 

It is noted that the above species of polynucleotide fragments of the present invention may 

30 alternatively be described by the formula "a to b"; where "a" equals the 5 ? most nucleotide position and 
"b" equals the 3' most nucleotide position of the polynucleotide; and further where "a" equals an integer 
between 1 and the number of nucleotides of the polynucleotide sequence of the present invention minus 8, 
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and where "b M equals an integer between 9 and the number of nucleotides of the polynucleotide sequence 
of the present invention; and where "a" is an integer smaller then "b" by at least 8. 

The present invention also provides for the exclusion of any species of polynucleotide fragments 
of the present invention specified by 5' and 3 5 positions or sub-genuses of polynucleotides specified by 
5 size in nucleotides as described above. Any number of fragments specified by 5' and 3' positions or by 
size in nucleotides, as described above, may be excluded. Specifically excluded from the invention are 
the fragments described in Table IV. For these cDNAs referred to by their sequence identification 
numbers, Table IV gives the positions of excluded fragments within these sequences fragments having 
substantial homology to polyadenylation tails and to repeated sequences including Alu, LI, THE and 
10 MER repeats, SSTR sequences or satellite, micro-satellite, and telomeric repeats. Each fragment is 
represented by a-b where a and b are the start and end positions respectively of a given excluded 
fragment. Excluded fragments are separated from each other by a coma. As used herein the term 
" polynucleotide described in Table IV " refers to all polynucleotide fragments defined in Table IV in this 
manner. 

1 5 Preferred included and excluded polynucleotide fragments of the invention are also described in 

Tables Va and Table Vb. For these cDNAs referred to by their sequence identification numbers, Tables 
Va and Table Vb give the positions of preferred fragments within these sequences (columns entitled 
"Preferentially included fragments") as well as the positions of preferentially excluded fragments 
(columns entitled "Preferentially excluded fragments"). Each fragment is represented by a-b where a and 

20 b are the start and end positions respectively of a given preferred fragment. Fragments are separated from 
each other by a coma. As used herein the term " excluded polynucleotide described in Tables Va and Vb " 
refers to all polynucleotide preferentially excluded as described in Tables Va and Vb. As used herein the 
term " preferred polynucleotide described in Tables Va and Vb " refers to all preferentially included 
polynucleotide fragments listed in Tables Va and Table Vb in this manner. 

25 Therefore, the present invention encompasses isolated, purified, or recombinant polynucleotides 

which consist of, consist essentially of, or comprise a contiguous span of at least 8, 10, 12, 15, 18, 20, 25, 
28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 nucleotides of a sequence selected from 
the group consisting of the sequences of SEQ ID Nos: 1-241 and sequences fully complementary thereto, 
to the extent that a contiguous span of these lengths is consistent with the lengths of said selected 

30 sequence, wherein said contiguous span comprises at least 1, 2, 3, 5, 10, 15, 18, 20, 25, 28, 30, 35, 40, 50, 
75, 100, 150, 200, 300, 400 or 500 nucleotides of a preferred polynucleotide described in Tables Va and 
Vb, or a sequence complementary thereto. The present invention also encompasses isolated, purified, or 
recombinant polynucleotides comprising, consisting essentially of, or consisting of a contiguous span of 
at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 
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nucleotides of a polynucleotide selected from the group consisting of the sequences of SEQ ID Nos: 1- 
241 and sequences fully complementary thereto, wherein said contiguous span comprises a preferred 
polynucleotide described in Tables Va and Vb, or a sequence complementary thereto, to the extent that a 
contiguous span of these lengths is consistent with the length of the selected sequence. The present 
5 invention also encompasses isolated, purified, or recombinant nucleic acids which comprise, consist of or 
consist essentially of a contiguous span of a polynucleotide selected from the group consisting of the 
sequences of SEQ ID Nos: 1-241 and sequences fully complementary thereto, wherein said contiguous 
span comprises preferred polynucleotide described in Tables Va and Vb, or a sequence complementary 
thereto. 

10 Other preferred fragments of the invention are polynucleotides comprising polynucleotides 

encoding domains of polypeptides. Such fragments may be used to obtain other polynucleotides encoding 
polypeptides having similar domains using hybridization or RT-PCR techniques. Alternatively, these 
fragments may be used to express a polypeptide domain which may present a specific biological property. 
Preferred domains for the GENSET polypeptides of the invention are described in Table VI. Thus, 

15 another object of the invention is an isolated, purified or recombinant polynucleotide encoding a 
polypeptide consisting of, consisting essentially of, or comprising a contiguous span of at least 5, 6, 8, 10, 
12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, 200, 250, 300, 350, 400, 450 or 500 consecutive amino 
acids of a sequence selected from the group consisting of the sequences of SEQ ID Nos: 242-482, to the 
extent that a contiguous span of these lengths is consistent with the lengths of said selected sequence, 

20 where said contiguous span comprises at least 1, 2, 3, 5, or 10 of the amino acid positions of a domain of 
said selected sequence. The present invention also encompasses isolated, purified or recombinant 
polynucleotides encoding a polypeptide comprising a contiguous span of at least 5, 6, 8, 10, 12, 15, 20, 
25, 30, 35, 40, 50, 60, 75, 100, 150, 200, 250, 300, 350, 400, 450 or 500 consecutive amino acids of a 
sequence selected from the group consisting of sequences of SEQ ID Nos: 242-482, to the extent that a 

25 contiguous span of these lengths is consistent with the lengths of said selected sequence, where said 
contiguous span is a domain of said selected sequence. The present invention also encompasses isolated, 
purified or recombinant polynucleotides encoding a polypeptide comprising a domain of a sequence 
selected from the group consisting of the sequences of SEQ ID Nos: 242-482. 

The present invention further encompasses any combination of the polynucleotide fragments 

30 listed in this section. 

Oligonucleotide primers and probes 

The present invention also encompasses fragments of GENSET polynucleotides for use as 
primers and probes. Polynucleotides derived from the GENSET genomic and cDNA sequences are useful 
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in order to detect the presence of at least a copy of a GENSET polynucleotide or fragment, complement, 
or variant thereof in a test sample. 

Structural definition 

Any polynucleotide of the invention may be used as a primer or probe. Particularly preferred 
5 probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising 
a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000, 
1500 or 2000 nucleotides of a sequence selected from the group consisting of the GENSET genomic 
sequences, the cDNA sequences and the sequences fully complementary thereto. Another object of the 
invention is a purified, isolated, or recombinant polynucleotide comprising the nucleotide sequence of a 

10 sequence selected from the group consisting of the sequences of SEQ ID Nos: 1-241, sequences of clone 
inserts of the deposited clone pool, sequences fully complementary thereto, allelic variants thereof, and 
fragments thereof. Moreover, preferred probes and primers of the invention include purified, isolated, or 
recombinant GENSET cDNAs consisting of, consisting essentially of, or comprising the sequences of 
SEQ ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool. Particularly preferred 

15 probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000, 1500 or 2000 
nucleotides of a sequence selected from the group consisting of the sequences of SEQ ID Nos: 1-241 and the 
sequences fully complementary thereto. 

Design of primers and probes 

20 A probe or a primer according to the invention has between 8 and 1000 nucleotides in length, or 

is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500, 1000, 1500 or 2000 
nucleotides in length. More particularly, the length of these probes and primers can range from 8, 10, 15, 
20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 nucleotides. 
Shorter probes and primers tend to lack specificity for a target nucleic acid sequence and generally require 

25 cooler temperatures to form sufficiently stable hybrid complexes with the template. Longer probes and 
primers are expensive to produce and can sometimes self-hybridize to form hairpin structures. The 
appropriate length for primers and probes under a particular set of assay conditions may be empirically 
determined by one of skill in the art. The formation of stable hybrids depends on the melting temperature 
(Tm) of the DNA. The Tm depends on the length of the primer or probe, the ionic strength of the solution 

30 and the G+C content. The higher the G+C content of the primer or probe, the higher is the melting 
temperature because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC 
content in the probes of the invention usually ranges between 10 and 75 %, preferably between 35 and 60 
%, and more preferably between 40 and 55 %. 
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For amplification purposes, pairs of primers with approximately the same Tm are preferable. 
Primers may be designed using the OSP software (Hillier and Green, 1991), the disclosure of which is 
incorporated by reference in its entirety, based on GC content and melting temperatures of 
oligonucleotides, or using PC-Rare (http:// bioinformatics.weizmann.ac.il/software/PC- 
5 Rare/doc/manuel.html) based on the octamer frequency disparity method (Griffais et ai, 1991), the 
disclosure of which is incorporated by reference in its entirety. DNA amplification techniques are well 
known to those skilled in the art. Amplification techniques that can be used in the context of the present 
invention include, but are not limited to, the ligase chain reaction (LCR) described in EP-A- 320 308, WO 
9320227 and EP-A-439 182, the polymerase chain reaction (PCR, RT-PCR) and techniques such as the 
10 nucleic acid sequence based amplification (NASBA) described in Guatelli et a/.(1990) and in Compton 
(1991), Q-beta amplification as described in European Patent Application No 4544610, strand 
displacement amplification as described in Walker et aL (1996) and EP A 684 315 and, target mediated 
amplification as described in PCT Publication WO 9322461, the disclosures of which are incorporated by 
reference in their entireties. 

15 LCR and Gap LCR are exponential amplification techniques, both depend on DNA ligase to join 

adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), probe pairs are used 
which include two primary (first and second) and two secondary (third and fourth) probes, all of which 
are employed in molar excess to target. The first probe hybridizes to a first segment of the target strand 
and the second probe hybridizes to a second segment of the target strand, the first and second segments 

20 being contiguous so that the primary probes abut one another in 5' phosphate -3'hydroxyl relationship, 
and so that a ligase can covalently fuse or ligate the two probes into a fused product. In addition, a third 
(secondary) probe can hybridize to a portion of the first probe and a fourth (secondary) probe can 
hybridize to a portion of the second probe in a similar abutting fashion. Of course, if the target is initially 
double stranded, the secondary probes also will hybridize to the target complement in the first instance. 

25 Once the ligated strand of primary probes is separated from the target strand, it will hybridize with the 
third and fourth probes, which can be ligated to form a complementary, secondary ligated product. It is 
important to realize that the ligated products are functionally equivalent to either the target or its 
complement. By repeated cycles of hybridization and ligation, amplification of the target sequence is 
achieved. A method for multiplex LCR has also been described (WO 9320227), the disclosure of which 

30 is incorporated by reference in its entirety. Gap LCR (GLCR) is a version of LCR where the probes are 
not adjacent but are separated by 2 to 3 bases. 

For amplification of mRNAs, it is within the scope of the present invention to reverse transcribe 
mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single enzyme for both 
steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric Gap LCR (RT-AGLCR) as 
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described by Marshall et a/.(1994), the disclosures of which are incorporated by reference in its entireties. 
AGLCR is a modification of GLCR that allows the amplification of RNA. 

The PCR technology is the preferred amplification technique used in the present invention. A 
variety of PCR techniques are familiar to those skilled in the art. For a review of PCR technology, see 
5 White (1997), Erlich (1992) and the publication entitled "PCR Methods and Applications" (1991, Cold 
Spring Harbor Laboratory Press), the disclosures of which are incorporated by reference in its entireties. 
In each of these PCR procedures, PCR primers on either side of the nucleic acid sequences to be 
amplified are added to a suitably prepared nucleic acid sample along with dNTPs and a thermostable 
polymerase such as Taq polymerase, Pfu polymerase, Tth polymerase or Vent polymerase. The nucleic 

10 acid in the sample is denatured and the PCR primers are specifically hybridized to complementary nucleic 
acid sequences in the sample. The hybridized primers are extended. Thereafter, another cycle of 
denaturation, hybridization, and extension is initiated. The cycles are repeated multiple times to produce 
an amplified fragment containing the nucleic acid sequence between the primer sites. PCR has further 
been described in several patents including US Patents 4,683,195; 4,683,202; and 4,965,188, the 

1 5 disclosures of which are incorporated herein by reference in their entireties. 

Preparation of primers and probes 

The primers and probes can be prepared by any suitable method, including, for example, cloning 
and restriction of appropriate sequences and direct chemical synthesis by a method such as the 
phosphodiester method of Narang et a/.(1979), the phosphodiester method of Brown et a/.(1979), the 

20 diethylphosphoramidite method of Beaucage et a/.(1981) and the solid support method described in EP 0 
707 592, which disclosures are hereby incorporated by reference in their entireties. 

Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs such as, 
for example peptide nucleic acids which are disclosed in International Patent Application WO 92/20702, 
morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 5,034,506 and 5,142,047, 

25 which disclosures are hereby incorporated by reference in their entireties. The probe may have to be 
rendered "non-extendable" in that additional dNTPs cannot be added to the probe. In and of themselves 
analogs usually are non-extendable and nucleic acid probes can be rendered non-extendable by modifying 
the 3' end of the probe such that the hydroxyl group is no longer capable of participating in elongation. 
For example, the 3' end of the probe can be functionalized with the capture or detection label to thereby 

30 consume or otherwise block the hydroxyl group. Alternatively, the 3' hydroxyl group simply can be 
cleaved, replaced or modified, U.S. Patent Application Serial No. 07/049,061 filed April 19, 1993, which 
disclosure is hereby incorporated by reference in its entirety, describes modifications, which can be used 
to render a probe non-extendable. 
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Labeling of probes 

Any of the polynucleotides of the present invention can be labeled, if desired, by incorporating 
any label known in the art to be detectable by spectroscopic, photochemical, biochemical, 
immunochemical, or chemical means. For example, useful labels include radioactive substances 
5 (including, 32 P, 35 S, 3 H, 125 I), fluorescent dyes (including, 5-bromodesoxyuridin, fluorescein, 
acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at their 3' and 5' 
ends. Examples of non-radioactive labeling of nucleic acid fragments are described in the French patent 
No. FR-7810975 or by Urdea et al (1988) or Sanchez-Pescador et al (1988), which disclosures are hereby 
incorporated by reference in their entireties. In addition, the probes according to the present invention 
10 may have structural characteristics such that they allow the signal amplification, such structural 
characteristics being, for example, branched DNA probes as those described by Urdea et al. in 1991 or in 
the European patent No. EP 0 225 807 (Chiron), which disclosures are hereby incorporated by reference 
in their entireties. 

The detectable probe may be single stranded or double stranded and may be made using 
15 techniques known in the art, including in vitro transcription, nick translation, or kinase reactions. A 
nucleic acid sample containing a sequence capable of hybridizing to the labeled probe is contacted with 
the labeled probe. If the nucleic acid in the sample is double stranded, it may be denatured prior to 
contacting the probe. In some applications, the nucleic acid sample may be immobilized on a surface 
such as a nitrocellulose or nylon membrane. The nucleic acid sample may comprise nucleic acids 
20 obtained from a variety of sources, including genomic DNA, cDNA libraries, RNA, or tissue samples. 

Procedures used to detect the presence of nucleic acids capable of hybridizing to the detectable 
probe include well known techniques such as Southern blotting, Northern blotting, dot blotting, colony 
hybridization, and plaque hybridization. In some applications, the nucleic acid capable of hybridizing to 
the labeled probe may be cloned into vectors such as expression vectors, sequencing vectors, or in vitro 
25 transcription vectors to facilitate the characterization and expression of the hybridizing nucleic acids in 
the sample. For example, such techniques may be used to isolate and clone sequences in a genomic 
library or cDNA library which are capable of hybridizing to the detectable probe as described herein. 

Immobilization of probes 

A label can also be used to capture the primer, so as to facilitate the immobilization of either the 
30 primer or a primer extension product, such as amplified DNA, on a solid support. A capture label is 
attached to the primers or probes and can be a specific binding member which forms a binding pair with 
the solid's phase reagent's specific binding member (e.g. biotin and streptavidin). Therefore depending 
upon the type of label carried by a polynucleotide or a probe, it may be employed to capture or to detect 
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the target DNA. Further, it will be understood that the polynucleotides, primers or probes provided 
herein, may, themselves, serve as the capture label. For example, in the case where a solid phase 
reagent's binding member is a nucleic acid sequence, it may be selected such that it binds a 
complementary portion of a primer or probe to thereby immobilize the primer or probe to the solid phase. 
5 In cases where a polynucleotide probe itself serves as the binding member, those skilled in the art will 
recognize that the probe will contain a sequence or "tail" that is not complementary to the target. In the 
case where a polynucleotide primer itself serves as the capture label, at least a portion of the primer will 
be free to hybridize with a nucleic acid on a solid phase. DNA Labeling techniques are well known to the 
skilled technician. 

10 The probes of the present invention are useful for a number of purposes. They can be notably 

used in Southern hybridization to genomic DNA. The probes can also be used to detect PCR 
amplification products. They may also be used to detect mismatches in the GENSET gene or mRNA 
using other techniques. 

Any of the polynucleotides, primers and probes of the present invention can be conveniently 

15 immobilized on a solid support. The solid support is not critical and can be selected by one skilled in the 
art. Thus, latex particles, microparticles, magnetic beads, non-magnetic beads (including polystyrene 
beads), membranes (including nitrocellulose strips), plastic tubes, walls of microtiter wells, glass or 
silicon chips, sheep (or other suitable animal's) red blood cells and duracytes are all suitable examples. 
Suitable methods for immobilizing nucleic acids on solid phases include ionic, hydrophobic, covalent 

20 interactions and the like. A solid support, as used herein, refers to any material which is insoluble, or can 
be made insoluble by a subsequent reaction. The solid support can be chosen for its intrinsic ability to 
attract and immobilize the capture reagent. Alternatively, the solid phase can retain an additional receptor 
which has the ability to attract and immobilize the capture reagent. The additional receptor can include a 
charged substance that is oppositely charged with respect to the capture reagent itself or to a charged 

25 substance conjugated to the capture reagent. As yet another alternative, the receptor molecule can be any 
specific binding member which is immobilized upon (attached to) the solid support and which has the 
ability to immobilize the capture reagent through a specific binding reaction. The receptor molecule 
enables the indirect binding of the capture reagent to a solid support material before the performance of 
the assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized 

30 plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, 
bead, microparticle, chip, sheep (or other suitable animal's) red blood cells, duracytes® and other 
configurations known to those of ordinary skill in the art. The polynucleotides of the invention can be 
attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 15, 20, or 
25 distinct polynucleotides of the invention to a single solid support. In addition, polynucleotides other 
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than those of the invention may be attached to the same solid support as one or more polynucleotides of 
the invention. 

Oligonucleotide array 

A substrate comprising a plurality of oligonucleotide primers or probes of the invention may be 
5 used either for detecting or amplifying targeted sequences in GENSET genes, may also be used for 
detecting mutations in the coding or in the non-coding sequences of GENSET genes, and may also be 
used to determine GENSET gene expression in different contexts such as in different tissues, at different 
stages of a process (embryo development, disease treatment), and in patients versus healthy individuals as 
described elsewhere in the application. 

10 As used herein, the term " array " means a one dimensional, two dimensional, or multidimensional 

arrangement of nucleic acids of sufficient length to permit specific detection of gene expression. For 
example, the array may contain a plurality of nucleic acids derived from genes whose expression levels 
are to be assessed. The array may include a GENSET genomic DNA, a GENSET cDNA, sequences 
complementary thereto or fragments thereof. Preferably, the fragments are at least 12, 15, 18, 20, 25, 30, 

15 35, 40 or 50 nucleotides in length. More preferably, the fragments are at least 100 nucleotides in length. 
Even more preferably, the fragments are more than 100 nucleotides in length. In some embodiments the 
fragments may be more than 500 nucleotides in length. 

Any polynucleotide provided herein may be attached in overlapping areas or at random locations 
on the solid support. Alternatively the polynucleotides of the invention may be attached in an ordered 

20 array wherein each polynucleotide is attached to a distinct region of the solid support which does not 
overlap with the attachment site of any other polynucleotide. Preferably, such an ordered array of 
polynucleotides is designed to be "addressable" where the distinct locations are recorded and can be 
accessed as part of an assay procedure. Addressable polynucleotide arrays typically comprise a plurality 
of different oligonucleotide probes that are coupled to a surface of a substrate in different known 

25 locations. The knowledge of the precise location of each polynucleotides location makes these 
"addressable" arrays particularly useful in hybridization assays. Any addressable array technology 
known in the art can be employed with the polynucleotides of the invention. One particular embodiment 
of these polynucleotide arrays is known as the Genechips™, and has been generally described in US 
Patent 5,143,854; PCT publications WO 90/15070 and 92/10092, which disclosures are hereby 

30 incorporated by reference in their entireties. These arrays may generally be produced using mechanical 
synthesis methods or light directed synthesis methods which incorporate a combination of 
photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991), which 
disclosure is hereby incorporated by reference in its entirety. The immobilization of arrays of 
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oligonucleotides on solid supports has been rendered possible by the development of a technology 
generally identified as "Very Large Scale Immobilized Polymer Synthesis" (VLSIPS™) in which, 
typically, probes are immobilized in a high density array on a solid surface of a chip. Examples of 
VLSIPS™ technologies are provided in US Patents 5,143,854; and 5,412,087 and in PCT Publications 
5 WO 90/15070, WO 92/10092 and WO 95/1 1995, which disclosures are hereby incorporated by reference 
in their entireties, which describe methods for forming oligonucleotide arrays through techniques such as 
light-directed synthesis techniques. In designing strategies aimed at providing arrays of nucleotides 
immobilized on solid supports, further presentation strategies were developed to order and display the 
oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence 

10 information. Examples of such presentation strategies are disclosed in PCT Publications WO 94/12305, 
WO 94/11530, WO 97/29212 and WO 97/31256, the disclosures of which are incorporated herein by 
reference in their entireties. 

Consequently, the invention concerns an array of nucleic acid molecules comprising at least one 
polynucleotide of the invention, particularly a probe or primer as described herein. Preferably, the 

15 invention concerns an array of nucleic acid comprising at least two polynucleotides of the invention, 
particularly probes or primers as described herein. Preferably, the invention concerns an array of nucleic 
acid comprising at least five polynucleotides of the invention, particularly probes or primers as described 
herein. 

A preferred embodiment of the present invention is an array of polynucleotides of at least 12, 15, 
20 18, 20, 25, 30, 35, 40, 50, 100, 500, 1000, 1500 or 2000 nucleotides in length which includes at least 1, 2, 
5, 10, 15, 20, 35, 50, 100, 150 or 200 sequences selected from the group consisting of the sequences of 
SEQ ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool, sequences fully 
complementary thereto, and fragments thereof. 

Methods of making the polynucleotides of the invention 

25 The present invention also comprises methods of making the polynucleotides of the invention, 

including the polynucleotides of SEQ ID Nos: 1-241, genomic DNA obtainable therefrom, or fragment 
thereof. These methods comprise sequentially linking together nucleotides to produce the nucleic acids 
having the preceding sequences. Polynucleotides of the invention may be synthesized either 
enzymatically using techniques well known to those skilled in the art including amplification or 

30 hybridization-based methods as described herein, or chemically. 

A variety of chemical methods of synthesizing nucleic acids are known to those skilled in the art. 
In many of these methods, synthesis is conducted on a solid support. These included the 3' 
phosphoramidite methods in which the 3 5 terminal base of the desired oligonucleotide is immobilized on 
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an insoluble carrier. The nucleotide base to be added is blocked at the 5' hydroxy 1 and activated at the 3' 
hydroxyl so as to cause coupling with the immobilized nucleotide base. Deblocking of the new 
immobilized nucleotide compound and repetition of the cycle will produce the desired polynucleotide. 
Alternatively, polynucleotides may be prepared as described in U.S. Patent No. 5,049,656, which 
5 disclosure is hereby incorporated by reference in its entirety. In some embodiments, several 
polynucleotides prepared as described above are ligated together to generate longer polynucleotides' 
having a desired sequence. 

Polypeptides of the invention 

The term "GENSET polypeptides" is used herein to embrace all of the proteins and polypeptides 
10 of the present invention. The present invention encompasses GENSET polypeptides, including 
recombinant, isolated or purified GENSET polypeptides consisting of, consisting essentially of, or 
comprising a sequence selected from the group consisting of SEQ ID Nos: 242-482, the polypeptides 
encoded by human cDNAs contained in the deposited clones, the mature proteins included in SEQ ID 
Nos: 242-272 and 274-384, mature proteins encoded by the clone inserts of the deposited clone pool, and 
15 variants thereof. Other objects of the invention are polypeptides encoded by the polynucleotides of the 
invention as well as fusion polypeptides comprising such polypeptide. 

Polypeptide variants 

The present invention further provides for GENSET polypeptides encoded by allelic and splice 
variants, orthologs, and/or species homologues. Procedures known in the art can be used to obtain, allelic 
20 variants, splice variants, orthologs, and/or species homologues of polynucleotides encoding by 
polypeptides of the group consisting of SEQ ID Nos: 242-482, mature proteins included in. SEQ ID Nos: 
242-272 and 274-384, and polypeptides either fill-length or mature encoded by the clone inserts of the 
deposited clone pool, using information from the sequences disclosed herein or the clones deposited with 
the ATCC. 

25 The polypeptides of the present invention also include polypeptides having an amino acid 

sequence at least 50% identical, more preferably at least 60% identical, and still more preferably 70%, 
80%, 90%, 95%, 96%, 97%, 98% or 99% identical to a polypeptide selected from the group consisting of 
the sequences of SEQ ID Nos: 242-482, mature proteins included in sequences of SEQ ID Nos: 242-272 
and 274-384, and full-length or mature polypeptides encoded by the clone inserts of the deposited clone 

30 pool. By a polypeptide having an amino acid sequence at least, for example, 95% "identical" to a query 
amino acid sequence of the present invention, it is intended that the amino acid sequence of the subject 
polypeptide is identical to the query sequence except that the subject polypeptide sequence may include 
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up to five amino acid alterations per each 100 amino acids of the query amino acid sequence. In other 
words, to obtain a polypeptide having an amino acid sequence at least 95% identical to a query amino 
acid sequence, up to 5% (5 of 100) of the amino acid residues in the subject sequence may be inserted, 
deleted, (indels) or substituted with another amino acid. 
5 Further polypeptides of the present invention include polypeptides which have at least 90% 

similarity, more preferably at least 95% similarity, and still more preferably at least 96%, 97%, 98% or 
99% similarity to those described above. By a polypeptide having an amino acid sequence at least, for 
example, 95% "similar" to a query amino acid sequence of the present invention, it is intended that the 
amino acid sequence of the subject polypeptide is similar (i.e. contain identical or equivalent amino acid 

10 residues) to the query sequence except that the subject polypeptide sequence may include up to five 
amino acid alterations per each 100 amino acids of the query amino acid sequence. In other words, to 
obtain a polypeptide having an amino acid sequence at least 95% similar to a query amino acid sequence, 
up to 5% (5 of 100) of the amino acid residues in the subject sequence may be inserted, deleted, (indels) 
or substituted with another non-equivalent amino acid. 

15 These alterations of the reference sequence may occur at the amino or carboxy terminal positions 

of the reference amino acid sequence or anywhere between those terminal positions, interspersed either 
individually among residues in the reference sequence or in one or more contiguous groups within the 
reference sequence. The query sequence may be an entire amino acid sequence selected from the group 
consisting of sequences of SEQ ID Nos: 242-482 and those encoded by the clone inserts of the deposited 

20 clone pool or any fragment specified as described herein. 

The variant polypeptides described herein are included in the present invention regardless of 
whether they have their normal biological activity. This is because even where a particular polypeptide 
molecule does not have biological activity, one of skill in the art would still know how to use the 
polypeptide, for instance, as a vaccine or to generate antibodies. Other uses of the polypeptides of the 

25 present invention that do not have GENSET biological activity include, inter alia, as epitope tags, in 
epitope mapping, and as molecular weight markers on SDS-PAGE gels or on molecular sieve gel 
filtration columns using methods known to those of skill in the art. As described below, the polypeptides 
of the present invention can also be used to raise polyclonal and monoclonal antibodies, which are useful 
in assays for detecting GENSET protein expression or as agonists and antagonists capable of enhancing 

30 or inhibiting GENSET protein function. Further, such polypeptides can be used in the yeast two-hybrid 
system to "capture" GENSET protein binding proteins, which are also candidate agonists and antagonists 
according to the present invention {See, e.g., Fields et al 1989), which disclosure is hereby incorporated 
by reference in its entirety. 
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Preparation of the polypeptides of the invention 

The polypeptides of the present invention can be prepared in any suitable manner. Such 
polypeptides include isolated naturally occurring polypeptides, recombinantly produced polypeptides, 
synthetically produced polypeptides, or polypeptides produced by a combination of these methods. The 
5 polypeptides of the present invention are preferably provided in an isolated form, and may be partially or 
preferably substantially purified. 

Consequently, the present invention also comprises methods of making the polypeptides of the 
invention, particularly polypeptides encoded by the cDNAs of SEQ ID Nos: 1-241, mature proteins 
encoded by fragments of SEQS ID Nos: 1-31 and 33-143, full-length and mature polypeptides encoded 
10 by the clone inserts of the deposited clone pool, genomic DNA obtainable therefrom, or fragments thereof 
and methods of making the polypeptides of SEQ ID Nos: 242-482, mature polypeptides included in SEQ 
ID Nos: 242-272 and 274-384, or fragments thereof. The methods comprise sequentially linking together 
amino acids to produce the nucleic polypeptides having the preceding sequences. In some embodiments, 
the polypeptides made by these methods are 150 amino acids or less in length. In other embodiments, the 
15 polypeptides made by these methods are 120 amino acids or less in length. 

Isolation 

From natural sources 

The GENSET proteins of the invention may be isolated from natural sources, including bodily 
fluids, tissues and cells, whether directly isolated or cultured cells, of humans or non-human animals. 

20 Methods for extracting and purifying natural proteins are known in the art, and include the use of 
detergents or chaotropic agents to disrupt particles followed by differential extraction and separation of 
the polypeptides by ion exchange chromatography, affinity chromatography, sedimentation according to 
density, and gel electrophoresis. See, for example, "Methods in Enzymology, Academic Press, 1993" for 
a variety of methods for purifying proteins, which disclosure is hereby incorporated by reference in its 

25 entirety. Polypeptides of the invention also can be purified from natural sources using antibodies directed 
against the polypeptides of the invention, such as those described herein, in methods which are well 
known in the art of protein purification. 

From recombinant sources 

Preferably, the GENSET polypeptides of the invention are recombinantly produced using routine 
30 expression methods known in the art. The polynucleotide encoding the desired polypeptide is operably 
linked to a promoter into an expression vector suitable for any convenient host. Both eukaryotic and 
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prokaryotic host systems are used in forming recombinant polypeptides. The polypeptide is then isolated 
from lysed cells or from the culture medium and purified to the extent needed for its intended use. 

Any GENSET polynucleotide, including those described in SEQ ID Nos: 1-241, those of clone 
inserts of the deposited clone pool, and allelic variants thereof may be used to express GENSET 
5 polypeptides. The nucleic acid encoding the GENSET polypeptide to be expressed is operably linked to a 
promoter in an expression vector using conventional cloning technology. The GENSET insert in the 
expression vector may comprise the full coding sequence for the GENSET protein or a portion thereof, 
especially the sequence for a mture polypeptide. For example, the GENSET derived insert may encode a 
polypeptide comprising at least 6, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 

10 consecutive amino acids of a GENSET protein selected from the group consisting of sequences of SEQ 
ID Nos: 242-482 and polypeptides encoded by the clone inserts of the deposited clone pool. 

Consequently, a further embodiment of the present invention is a method of making a polypeptide 
comprising a protein selected from the group consisting of sequences of SEQ ID Nos: 242-482 and 
polypeptides encoded by the clone inserts of the deposited clone pool, said method comprising the steps 

15 of 

a) obtaining a cDNA comprising a sequence selected from the group consisting of i) the 
sequences SEQ ID Nos: 1-241, ii) the sequences of clone inserts of the deposited clone pool one, hi) 
sequences encoding one of the polypeptide of SEQ ID Nos: 242-482, and iv) sequences of 
polynucleotides encoding a polypeptide which is encoded by one of the clone insert of the deposited clone 

20 pool; 

b) inserting said cDNA in an expression vector such that the cDNA is operably linked to a 
promoter; and 

c) introducing said expression vector into a host cell whereby said host cell produces said 
polypeptide. 

25 In one aspect of this embodiment, the method further comprises the step of isolating the 

polypeptide. Another embodiment of the present invention is a polypeptide obtainable by the method 
described in the preceding paragraph. 

The expression vector is any of the mammalian, yeast, insect or bacterial expression systems 
known in the art. Commercially available vectors and expression systems are available from a variety of 

30 suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega 
(Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and 
facilitate proper protein folding, the codon context and codon pairing of the sequence is optimized for the 
particular expression organism in which the expression vector is introduced, as explained in U.S. Patent 
No. 5,082,767, which disclosure is hereby incorporated by reference in its entirety. 
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In one embodiment, the entire coding sequence of a GENSET cDNA and the 3'UTR through the 
poly A signal of the cDNA is operably linked to a promoter in the expression vector. Alternatively, if the 
nucleic acid encoding a portion of the GENSET protein lacks a methionine to serve as the initiation site, 
an initiating methionine can be introduced next to the first codon of the nucleic acid using conventional 
5 techniques. Similarly, if the insert from the GENSET cDNA lacks a poly A signal, this sequence can be 
added to the construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using Bgll 
and Sail restriction endonuclease enzymes and incorporating it into the mammalian expression vector 
pXTl (Stratagene). pXTl contains the LTRs and a portion of the gag gene from Moloney Murine 
Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. The vector 

10 includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene. The nucleic 
acid encoding the GENSET protein or a portion thereof is obtained by PCR from a vector containing a 
GENSET cDNA selected from the group consisting of the sequences of SEQ ED Nos: 1-241 and the clone 
inserts of the deposited clone pool using oligonucleotide primers complementary to the GENSET cDNA 
or portion thereof and containing restriction endonuclease sequences for Pst I incorporated into the 5' 

15 primer and Bglll at the 5' end of the corresponding cDNA 3' primer, taking care to ensure that the 
sequence encoding the GENSET protein or a portion thereof is positioned properly with respect to the 
poly A signal. The purified fragment obtained from the resulting PCR reaction is digested with PstI, 
blunt ended with an exonuclease, digested with Bgl II, purified and ligated to pXTl, now containing a 
poly A signal and digested with Bglll. 

20 Alternatively, cDNAs encoding secreted proteins may be cloned into pED6dpc2 (DiscoverEase, 

Genetics Institute, Cambridge, MA). The resulting pED6dpc2 constructs may be transfected into a 
suitable host cell, such as COS 1 cells. Methotrexate resistant cells are selected and expanded. 
Preferably, the secreted protein expressed from the cDNA is released into the culture medium thereby 
facilitating purification. 

25 In another embodiment, it is often advantageous to add to the recombinant polynucleotide 

additional nucleotide sequence which codes for secretory or leader sequences, pro-sequences, sequences 
which aid in purification, such as multiple histidine residues, or an additional sequence for stability during 
recombinant production. 

As a control, the expression vector lacking a cDNA insert is introduced into host cells or 
30 organisms. 

Transfection of a GENSET expressing vector into mouse NTH 3T3 cells is but one embodiment 
of introducing polynucleotides into host cells. Introduction of a polynucleotide encoding a polypeptide 
into a host cell can be effected by calcium phosphate transfection, DEAE-dextran mediated transfection, 
cationic lipid-mediated transfection, electroporation, transduction, infection, or other methods. Such 
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methods are described in many standard laboratory manuals, such as Davis et al. (1986), which disclosure 
is hereby incorporated by reference in its entirety. It is specifically contemplated that the polypeptides of 
the present invention may in fact be expressed by a host cell lacking a recombinant vector. 

Recombinant cell extracts, or proteins from the culture medium if the expressed polypeptide is 
5 secreted, are then prepared and proteins separated by gel electrophoresis. If desired, the proteins may be 
ammonium sulfate precipitated or separated based on size or charge prior to electrophoresis. The proteins 
present are detected using techniques such as Coomassie or silver staining or using antibodies against the 
protein encoded by the GENSET cDNA of interest. Coomassie and silver staining techniques are familiar 
to those skilled in the art. 

10 Proteins from the host cells or organisms containing an expression vector which contains the 

GENSET cDNA or a fragment thereof are compared to those from the control cells or organism. The 
presence of a band from the cells containing the expression vector which is absent in control cells 
indicates that the GENSET cDNA is expressed. Generally, the band corresponding to the protein 
encoded by the GENSET cDNA will have a mobility near that expected based on the number of amino 

1 5 acids in the open reading frame of the cDNA. However, the band may have a mobility different than that 
expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic cleavage. 

Alternatively, the GENSET polypeptide to be expressed may also be a product of transgenic 
animals, i.e., as a component of the milk of transgenic cows, goats, pigs or sheeps which are characterized 
by somatic or germ cells containing a nucleotide sequence encoding the protein of interest. 

20 A polypeptide of this invention can be recovered and purified from recombinant cell cultures by 

well-known methods including differential extraction, ammonium sulfate or ethanol precipitation, acid 
extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic 
interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin 
chromatography. See, for example, "Methods in Enzymology", supra for a variety of methods for 

25 purifying proteins. Most preferably, high performance liquid chromatography ("HPLC") is employed for 
purification. A recombinantly produced version of a GENSET polypeptide can be substantially purified 
using techniques described herein or otherwise known in the art, such as, for example, by the one-step 
method described in Smith and Johnson (1988), which disclosure is hereby incorporated by reference in 
its entirety. Polypeptides of the invention also can be purified from recombinant sources using antibodies 

30 directed against the polypeptides of the invention, such as those described herein, in methods which are 
well known in the art of protein purification. 

Preferably, the recombinantly expressed GENSET polypeptide is purified using standard 
immunochromatography techniques such as the one described in the section entitled "Immunoaffinity 
Chromatography". In such procedures, a solution containing the protein of interest, such as the culture 
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medium or a cell extract, is applied to a column having antibodies against the protein attached to the 
chromatography matrix. The recombinant protein is allowed to bind the immunochromatography column. 
Thereafter, the column is washed to remove non-specifically bound proteins. The specifically bound 
protein is then released from the column and recovered using standard techniques. 
5 If antibody production is not possible, the GENSET cDNA sequence or fragment thereof may be 

incorporated into expression vectors designed for use in purification schemes employing chimeric 
polypeptides. In such strategies the coding sequence of the GENSET cDNA or fragment thereof is 
inserted in frame with the gene encoding the other half of the chimera. The other half of the chimera may 
be beta-globin or a nickel binding polypeptide encoding sequence. A chromatography matrix having 

10 antibody to beta-globin or nickel attached thereto is then used to purify the chimeric protein. Protease 
cleavage sites may be engineered between the beta-globin gene or the nickel binding polypeptide and the 
GENSET cDNA or fragment thereof. Thus, the two polypeptides of the chimera may be separated from 
one another by protease digestion. 

One useful expression vector for generating beta-globin chimerics is pSG5 (Stratagene), which 

15 encodes rabbit beta-globin. Intron II of the rabbit beta-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. 
These techniques as described are well known to those skilled in the art of molecular biology. Standard 
methods are published in methods texts such as Davis et al, (1986) and many of the methods are 
available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be 

20 produced from the construct using in vitro translation systems such as the In vitro Express™ Translation 
Kit (Stratagene). 

Depending upon the host employed in a recombinant production procedure, the polypeptides of 
the present invention may be glycosylated or may be non-glycosylated. In addition, polypeptides of the 
invention may also include an initial modified methionine residue, in some cases as a result of host- 
25 mediated processes. Thus, it is well known in the art that the N-terminal methionine encoded by the 
translation initiation codon generally is removed with high efficiency from any protein after translation in 
all eukaryotic cells. While the N-terminal methionine on most proteins also is efficiently removed in most 
prokaryotes, for some proteins, this prokaryotic removal process is inefficient, depending on the nature of 
the amino acid to which the N-terminal methionine is covalently linked. 

30 From chemical synthesis 

In addition, polypeptides of the invention, especially short protein fragments, can be chemically 
synthesized using techniques known in the art {See, e.g., Creighton, 1983; and Hunkapiller et al., 1984), 
which disclosures are hereby incorporated by reference in their entireties. For example, a polypeptide 
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corresponding to a fragment of a polypeptide sequence of the invention can be synthesized by use of a 
peptide synthesizer. A variety of methods of making polypeptides are known to those skilled in the art, 
including methods in which the carboxyl terminal amino acid is bound to polyvinyl benzene or another 
suitable resin. The amino acid to be added possesses blocking groups on its amino moiety and any side 
5 chain reactive groups so that only its carboxyl moiety can react. The carboxyl group is activated with 
carbodiimide or another activating agent and allowed to couple to the immobilized amino acid. After 
removal of the blocking group, the cycle is repeated to generate a polypeptide having the desired 
sequence. Alternatively, the methods described in U.S. Patent No. 5,049,656, which disclosure is hereby 
incorporated by reference in its entirety, may be used. 

10 Furthermore, if desired, nonclassical amino acids or chemical amino acid analogs can be 

introduced as a substitution or addition into the polypeptide sequence. Non-classical amino acids include, 
but are not limited to, to the D-isomers of the common amino acids, 2,4-diaminobutyric acid, a-amino 
isobutyric acid, 4-aminobutyric acid, Abu, 2-amino butyric acid, g-Abu, e-Ahx, 6-amino hexanoic acid, 
Aib, 2-amino isobutyric acid, 3 -amino propionic acid, ornithine, norleucine, norvaline, hydroxyproline, 

15 sarcosine, citrulline, homocitrulline, cysteic acid, t-butyl glycine, t-butylalanine, phenylglycine, 
cyclohexylalanine, b-alanine, fluoroamino acids, designer amino acids such as b-methyl amino acids, 
Ca-methyl amino acids, Na-methyl amino acids, and amino acid analogs in general. Furthermore, the 
amino acid can be D (dextrorotary) or L (levorotary). 

Modifications 

20 The invention encompasses polypeptides which are differentially modified during or after 

translation, e.g., by glycosylation, acetylation, phosphorylation, amidation, derivatization by known 
protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule or other cellular ligand, 
etc. Any of numerous chemical modifications may be carried out by known techniques, including but not 
limited, to specific chemical cleavage by cyanogen bromide, trypsin, chymotrypsin, papain, V8 protease, 

25 NaBH4; acetylation, formylation, oxidation, reduction; metabolic synthesis in the presence of 
tunicamycin; etc. 

Additional post-translational modifications encompassed by the invention include, for example, 
e.g., N-linked or O-linked carbohydrate chains, processing of N-terminal or C-terminal ends), attachment 
of chemical moieties to the amino acid backbone, chemical modifications of N-linked or O-linked 
30 carbohydrate chains, and addition or deletion of an N-terminal methionine residue as a result of 
prokaryotic host cell expression. The polypeptides may also be modified with a detectable label, such as 
an enzymatic, fluorescent, isotopic or affinity label to allow for detection and isolation of the protein. 
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Also provided by the invention are chemically modified derivatives of the polypeptides of the 
invention which may provide additional advantages such as increased solubility, stability and circulating 
time of the polypeptide, or decreased immunogenicity. See U.S. Patent No: 4,179,337. The chemical 
moieties for derivatization may be selected See U.S. Patent NO: 4,179,337, which disclosure is hereby 
5 incorporated by reference in its entirety. The chemical moieties for derivatization may be selected from 
water soluble polymers such as polyethylene glycol, ethylene glycol/propylene glycol copolymers, 
carboxymethylcellulose, dextran, polyvinyl alcohol and the like. The polypeptides may be modified at 
random positions within the molecule, or at predetermined positions within the molecule and may include 
one, two, three or more attached chemical moieties. 

10 The polymer may be of any molecular weight, and may be branched or unbranched. For 

polyethylene glycol, the preferred molecular weight is between about 1 kDa and about 100 kDa (the term 
"about" indicating that in preparations of polyethylene glycol, some molecules will weigh more, some 
less, than the stated molecular weight) for ease in handling and manufacturing. Other sizes may be used, 
depending on the desired therapeutic profile (e.g., the duration of sustained release desired, the effects, if 

15 any on biological activity, the ease in handling, the degree or lack of antigenicity and other known effects 
of the polyethylene glycol to a therapeutic protein or analog). 

The polyethylene glycol molecules (or other chemical moieties) should be attached to the protein 
with consideration of effects on functional or antigenic domains of the protein. There are a number of 
attachment methods available to those skilled in the art, e.g., EP 0 401 384, (coupling PEG to G-CSF), 

20 and Malik et al. (1992) (reporting pegylation of GM-CSF using tresyl chloride), which disclosures are 
hereby incorporated by reference in their entireties. For example, polyethylene glycol may be covalently 
bound through amino acid residues via a reactive group, such as, a free amino or carboxyl group. Reactive 
groups are those to which an activated polyethylene glycol molecule may be bound. The amino acid 
residues having a free amino group may include lysine residues and the N-terminal amino acid residues; 

25 those having a free carboxyl group may include aspartic acid residues glutamic acid residues and the C- 
terminal amino acid residue. Sulfhydryl groups may also be used as a reactive group for attaching the 
polyethylene glycol molecules. Preferred for therapeutic purposes is attachment at an amino group, such 
as attachment at the N-terminus or lysine group. 

One may specifically desire proteins chemically modified at the N-terminus. Using polyethylene 

30 glycol as an illustration of the present composition, one may select from a variety of polyethylene glycol 
molecules (by molecular weight, branching, etc.), the proportion of polyethylene glycol molecules to 
protein (polypeptide) molecules in the reaction mix, the type of pegylation reaction to be performed, and 
the method of obtaining the selected N-terminally pegylated protein. The method of obtaining the 
N-terminally pegylated preparation (i.e., separating this moiety from other monopegylated moieties if 
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necessary) may be by purification of the N-terminally pegylated material from a population of pegylated 
protein molecules. Selective proteins chemically modified at the N-terminus modification may be 
accomplished by reductive alkylation, which exploits differential reactivity of different types of primary 
amino groups (lysine versus the N-terminal) available for derivatization in a particular protein. Under the 
5 appropriate reaction conditions, substantially selective derivatization of the protein at the N-terminus with 
a carbonyl group containing polymer is achieved. 

Multimerization 

The polypeptides of the invention may be in monomers or multimers (i.e., dimers, trimers, 
tetramers and higher multimers). Accordingly, the present invention relates to monomers and multimers 

10 of the polypeptides of the invention, their preparation, and compositions containing them. In specific 
embodiments, the polypeptides of the invention are monomers, dimers, trimers or tetramers. In additional 
embodiments, the multimers of the invention are at least dimers, at least trimers, or at least tetramers. 

Multimers encompassed by the invention may be homomers or heteromers. As used herein, the 
term " homomer" , refers to a multimer containing only polypeptides corresponding to the amino acid 

15 sequences of SEQ ID Nos: 242-482 or encoded by the clone inserts of the deposited clone pool (including 
fragments, variants, splice variants, and fusion proteins, corresponding to these polypeptides as described 
herein). These homomers may contain polypeptides having identical or different amino acid sequences. In 
a specific embodiment, a homomer of the invention is a multimer containing only polypeptides having an 
identical amino acid sequence. In another specific embodiment, a homomer of the invention is a multimer 

20 containing polypeptides having different amino acid sequences. In specific embodiments, the multimer of 
the invention is a homodimer (e.g., containing polypeptides having identical or different amino acid 
sequences) or a homotrimer (e.g., containing polypeptides having identical and/or different amino acid 
sequences). In additional embodiments, the homomenc multimer of the invention is at least a homodimer, 
at least a homotrimer, or at least a homotetramer. 

25 As used herein, the term " heteromer " refers to a multimer containing one or more heterologous 

polypeptides (i.e., polypeptides of different proteins) in addition to the polypeptides of the invention. In a 
specific embodiment, the multimer of the invention is a heterodimer, a heterotrimer, or a heterotetramer. 
In additional embodiments, the heteromeric multimer of the invention is at least a heterodimer, at least a 
heterotrimer, or at least a heterotetramer. 

30 Multimers of the invention may be the result of hydrophobic, hydrophilic, ionic and/or covalent 

associations and/or may be indirectly linked, by for example, liposome formation. Thus, in one 
embodiment, multimers of the invention, such as, for example, homodimers or homotrimers, are formed 
when polypeptides of the invention contact one another in solution. In another embodiment, 
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heteromultimers of the invention, such as, for example, heterotrimers or heterotetramers, are formed when 
polypeptides of the invention contact antibodies to the polypeptides of the invention (including antibodies 
to the heterologous polypeptide sequence in a fusion protein of the invention) in solution. In other 
embodiments, multimers of the invention are formed by covalent associations with and/or between the 
5 polypeptides of the invention. Such covalent associations may involve one or more amino acid residues 
contained in the polypeptide sequence ( e.g., that recited in the sequence listing, or contained in the 
polypeptide encoded by a deposited clone). In one instance, the covalent associations are cross-linking 
between cysteine residues located within the polypeptide sequences, which interact in the native (i.e., 
naturally occurring) polypeptide. In another instance, the covalent associations are the consequence of 
10 chemical or recombinant manipulation. Alternatively, such covalent associations may involve one or 
more amino acid residues contained in the heterologous polypeptide sequence in a fusion protein of the 
invention. 

In one example, covalent associations are between the heterologous sequence contained in a 
fusion protein of the invention (see, e.g., US Patent Number 5,478,925, which disclosure is hereby 

15 incorporated by reference in its entirety). In a specific example, the covalent associations are between the 
heterologous sequence contained in an Fc fusion protein of the invention (as described herein). In another 
specific example, covalent associations of fusion proteins of the invention are between heterologous 
polypeptide sequence from another protein that is capable of forming covalently associated multimers, 
such as for example, oseteoprotegerin (see, e.g., International Publication No: WO 98/49305, the contents 

20 of which are herein incorporated by reference in its entirety). In another embodiment, two or more 
polypeptides of the invention are joined through peptide linkers. Examples include those peptide linkers 
described in U.S. Pat. No. 5,073,627 (hereby incorporated by reference). Proteins comprising multiple 
polypeptides of the invention separated by peptide linkers may be produced using conventional 
recombinant DNA technology. 

25 Another method for preparing multimer polypeptides of the invention involves use of 

polypeptides of the invention fused to-a leucine zipper or isoleucine zipper polypeptide sequence. Leucine 
zipper and isoleucine zipper domains are polypeptides that promote multimerization of the proteins in 
which they are found. Leucine zippers were originally identified in several DNA-binding proteins, and 
have since been found in a variety of different proteins (Landschulz et a/., 1988). Among the known 

30 leucine zippers are naturally occurring peptides and derivatives thereof that dimerize or trimerize. 
Examples of leucine zipper domains suitable for producing soluble multimeric proteins of the invention 
are those described in PCT application WO 94/10308, hereby incorporated by reference. Recombinant 
fusion proteins comprising a polypeptide of the invention fused to a polypeptide sequence that dimerizes 
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or trimerizes in solution are expressed in suitable host cells, and the resulting soluble multimeric fusion 
protein is recovered from the culture supernatant using techniques known in the art. 

Trimeric polypeptides of the invention may offer the advantage of enhanced biological activity. 
Preferred leucine zipper moieties and isoleucine moieties are those that preferentially form trimers. One 
5 example is a leucine zipper derived from lung surfactant protein D (SPD), as described in Hoppe et al 
(1994) and in U.S. patent application Ser. No. 08/446,922, which disclosure is hereby incorporated by 
reference in its entirety. Other peptides derived from naturally occurring trimeric proteins may be 
employed in preparing trimeric polypeptides of the invention. In another example, proteins of the 
invention are associated by interactions between Flag® polypeptide sequence contained in fusion proteins 

10 of the invention containing Flag® polypeptide sequence. In a further embodiment, associations proteins 
of the invention are associated by interactions between heterologous polypeptide sequence contained in 
Flag® fusion proteins of the invention and anti Flag® antibody. 

The multimers of the invention may be generated using chemical techniques known in the art. For 
example, polypeptides desired to be contained in the multimers of the invention may be chemically 

15 cross-linked using linker molecules and linker molecule length optimization techniques known in the art 
(see, e.g., US Patent Number 5,478,925, which is herein incorporated by reference in its entirety). 
Additionally, multimers of the invention may be generated using techniques known in the art to form one 
or more inter-molecule cross-links between the cysteine residues located within the sequence of the 
polypeptides desired to be contained in the multimer (see, e.g., US Patent Number 5,478,925, which is 

20 herein incorporated by reference in its entirety). Further, polypeptides of the invention may be routinely 
modified by the addition of cysteine or biotin to the C terminus or N-terminus of the polypeptide and 
techniques known in the art may be applied to generate multimers containing one or more of these 
modified polypeptides (see, e.g., US Patent Number 5,478,925, which is herein incorporated by reference 
in its entirety). Additionally, 30 techniques known in the art may be applied to generate liposomes 

25 containing the polypeptide components desired to be contained in the multimer of the invention (see, e.g., 
US Patent Number 5,478,925, which is herein incorporated by reference in its entirety). 

Alternatively, multimers of the invention may be generated using genetic engineering techniques 
known in the art. In one embodiment, polypeptides contained in multimers of the invention are produced 
recombinantly using fusion protein technology described herein or otherwise known in the art (see, e.g., 

30 US Patent Number 5,478,925, which is herein incorporated by reference in its entirety). In a specific 
embodiment, polynucleotides coding for a homodimer of the invention are generated by ligating a 
polynucleotide sequence encoding a polypeptide of the invention to a sequence encoding a linker 
polypeptide and then further to a synthetic polynucleotide encoding the translated product of the 
polypeptide in the reverse orientation from the original C-terminus to the N-terminus (lacking the leader 
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sequence) (see, e.g., US Patent Number 5,478,925, which is herein incorporated by reference in its 
entirety). In another embodiment, recombinant techniques described herein or otherwise known in the art 
are applied to generate recombinant polypeptides of the invention which contain a transmembrane domain 
(or hydrophobic or signal peptide) and which can be incorporated by membrane reconstitution techniques 
5 into liposomes (see, e.g., US Patent Number 5,478,925, which is herein incorporated by reference in its 
entirety). 

Mutated polypeptides 

To improve or alter the characteristics of GENSET polypeptides of the present invention, protein 
engineering may be employed. Recombinant DNA technology known to those skilled in the art can be 

10 used to create novel mutant proteins or muteins including single or multiple amino acid substitutions, 
deletions, additions, or fusion proteins. Such modified polypeptides can show, e.g., increased/decreased 
biological activity or increased/decreased stability. In addition, they may be purified in higher yields and 
show better solubility than the corresponding natural polypeptide, at least under certain purification and 
storage conditions. Further, the polypeptides of the present invention may be produced as multimers 

15 including dimers, trimers and tetramers. Multimerization may be facilitated by linkers or recombinantly 
though heterologous polypeptides such as Fc regions. 

N- and C-terminal deletions 

It is known in the art that one or more amino acids may be deleted from the N-terminus or O 
terminus without substantial loss of biological function. For instance, Ron et al. (1993), reported modified 

20 KGF proteins that had heparin binding activity even if 3, 8, or 27 N-terminal amino acid residues were 
missing. Accordingly, the present invention provides polypeptides having one or more residues deleted 
from the amino terminus of the polypeptides of SEQ ID Nos: 242-482 or that encoded by the clone inserts 
of the deposited clone pool. Similarly, many examples of biologically functional C-terminal deletion 
mutants are known. For instance, Interferon gamma shows up to ten times higher activities by deleting 8- 

25 10 amino acid residues from the C-terminus of the protein {See, e.g., Dobeli, et al. 1988), which 
disclosure is hereby incorporated by reference in its entirety. Accordingly, the present invention provides 
polypeptides having one or more residues deleted from the carboxy terminus of the polypeptides shown 
of SEQ ID Nos: 242-482 or encoded by the clone inserts of the deposited clone pool. The invention also 
provides polypeptides having one or more amino acids deleted from both the amino and the carboxyl 

30 termini as described below. 
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Other mutations 

Other mutants in addition to N- and C-terminal deletion forms of the protein discussed above are 
included in the present invention. It also will be recognized by one of ordinary skill in the art that some 
amino acid sequences of the GENSET polypeptides of the present invention can be varied without 
5 significant effect of the structure or function of the protein. If such differences in sequence are 
contemplated, it should be remembered that there will be critical areas on the protein which determine 
activity. Thus, the invention further includes variations of the GENSET polypeptides which show 
substantial GENSET polypeptide activity. Such mutants include deletions, insertions, inversions, repeats, 
and substitutions selected according to general rules known in the art so as to have little effect on activity. 
10 For example, guidance concerning how to make phenotypically silent amino acid substitutions is 
provided. 

There are two main approaches for studying the tolerance of an amino acid sequence to change 
(See, Bowie et ah 1994), which disclosure is hereby incorporated by reference in its entirety. The first 
method relies on the process of evolution, in which mutations are either accepted or rejected by natural 
15 selection. 

The second approach uses genetic engineering to introduce amino acid changes at specific 
positions of a cloned gene and selections or screens to identify sequences that maintain functionality. 
These studies have revealed that proteins are surprisingly tolerant of amino acid substitutions. The studies 
indicate which amino acid changes are likely to be permissive at a certain position of the protein. For 

20 example, most buried amino acid residues require nonpolar side chains, whereas few features of surface 
side chains are generally conserved. Other such phenotypically silent substitutions are described by 
Bowie et al. (supra) and the references cited therein. 

Typically seen as conservative substitutions are the replacements, one for another, among the 
aliphatic amino acids Ala, Val, Leu and Phe; interchange of the hydroxyl residues Ser and Thr, exchange 

25 of the acidic residues Asp and Glu, substitution between the amide residues Asn and Gin, exchange of the 
basic residues Lys and Arg and replacements among the aromatic residues Phe, Tyr. Thus, the fragment, 
derivative, analog, or homologue of the polypeptide of the present invention may be, for example: (i) one 
in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino 
acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or 

30 may not be one encoded by the genetic code: or (ii) one in which one or more of the amino acid residues 
includes a substituent group: or (iii) one in which the GENSET polypeptide is fused with another 
compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene 
glycol): or (iv) one in which the additional amino acids are fused to the above form of the polypeptide, 
such as an IgG Fc fusion region peptide or leader or secretory sequence or a sequence which is employed 
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for purification of the above form of the polypeptide or a pro-protein sequence. Such fragments, 
derivatives and analogs are deemed to be within the scope of those skilled in the art from the teachings 
herein. 

Thus, the GENSET polypeptides of the present invention may include one or more amino acid 
5 substitutions, deletions, or additions, either from natural mutations or human manipulation. As indicated, 
changes are preferably of a minor nature, such as conservative amino acid substitutions that do not 
significantly affect the folding or activity of the protein. The following groups of amino acids generally 
represent equivalent changes: (1) Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) 
Val, He, Leu, Met, Ala, Phe; (4) Lys, Arg, His; (5) Phe, Tyr, Trp, His. 

10 A specific embodiment of a modified GENSET peptide molecule of interest according to the 

present invention, includes, but is not limited to, a peptide molecule which is resistant to proteolysis, is a 
peptide in which the -CONH- peptide bond is modified and replaced by a (CH2NH) reduced bond, a 
(NHCO) retro inverso bond, a (CH2-0) methylene-oxy bond, a (CH2-S) thiomethylene bond, a 
(CH2CH2) carba bond, a (CO-CH2) cetomethylene bond, a (CHOH-CH2) hydroxyethylene bond), a (N- 

15 N) bound, a E-alcene bond or also a -CH=CH- bond. The invention also encompasses a human GENSET 
polypeptide or a fragment or a variant thereof in which at least one peptide bond has been modified as 
described above. 

Amino acids in the GENSET proteins of the present invention that are essential for function can 
be identified by methods known in the art, such as site-directed mutagenesis or alanine-scanning 

20 mutagenesis (See, e.g., Cunningham et al. 1989), which disclosure is hereby incorporated by reference in 
its entirety. The latter procedure introduces single alanine mutations at every residue in the molecule. 
The resulting mutant molecules are then tested for biological activity using assays appropriate for 
measuring the function of the particular protein. Of special interest are substitutions of charged amino 
acids with other charged or neutral amino acids which may produce proteins with highly desirable 

25 improved characteristics, such as less aggregation. Aggregation may not only reduce activity but also be 
problematic when preparing pharmaceutical formulations, because aggregates can be immunogenic, (See, 
e.g., Pinckard et aL, 1967; Robbins, et al, 1987; and Cleland, et a/., 1993). 

A further embodiment of the invention relates to a polypeptide which comprises the amino acid 
sequence of a GENSET polypeptide having an amino acid sequence which contains at least one 

30 conservative amino acid substitution, but not more than 50 conservative amino acid substitutions, not 
more than 40 conservative amino acid substitutions, not more than 30 conservative amino acid 
substitutions, and not more than 20 conservative amino acid substitutions. Also provided are polypeptides 
which comprise the amino acid sequence of a GENSET polypeptide, having at least one, but not more 
than 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 conservative amino acid substitutions. 
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Polypeptide fragments 
Structural definition 

The present invention is further directed to fragments of the amino acid sequences described 
herein such as the polypeptides of SEQ ID Nos: 242-482, mature polypeptides included in SEQ ID Nos: 
5 242-272 and 274-384, or full-length or mature polypeptides encoded by the clone inserts of the deposited 
clone pool. More specifically, the present invention embodies purified, isolated, and recombinant 
polypeptides comprising at least 6, preferably at least 8 to 10, more preferably 12, 15, 20, 25, 30, 35, 40, 
50, 60, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 350, 400, 450 or 500 consecutive amino acids of 
a polypeptide selected from the group consisting of the sequences of SEQ ID Nos: 242-482, mature 

10 polypeptides included in SEQ ID Nos: 242-272 and 274-384, and full-length or mature polypeptides 
encoded by the clone inserts of the deposited clone pool, and other polypeptides of the present invention. 

In addition to the above polypeptide fragments, further preferred sub-genuses of polypeptides 
comprise at least 6 amino acids, wherein "at least 6" is defined as any integer between 6 and the integer 
representing the C-terminal amino acid of the polypeptide of the present invention including the 

15 polypeptide sequences of the sequence listing below. Further included are species of polypeptide 
fragments at least 6 amino acids in length, as described above, that are further specified in terms of their 
N-terminal and C-terminal positions. However, included in the present invention as individual species are 
all polypeptide fragments, at least 6 amino acids in length, as described above, and may be particularly 
specified by a N-terminal and C-terminal position. That is, every combination of a N-terminal and 

20 C-terminal position that a fragment at least 6 contiguous amino acid residues in length could occupy, on 
any given amino acid sequence of the sequence listing or of the present invention is included in the 
present invention 

The present invention also provides for the exclusion of any fragment species specified by 
N-terminal and C-terminal positions or of any fragment sub-genus specified by size in amino acid 

25 residues as described above. Any number of fragments specified by N-terminal and C-terminal positions 
or by size in amino acid residues as described above may be excluded as individual species. 

The above polypeptide fragments of the present invention can be immediately envisaged using 
the above description and are therefore not individually listed solely for the purpose of not unnecessarily 
lengthening the specification. Moreover, the above fragments need not have a GENSET biological 

30 activity, although polypeptides having these activities are preferred embodiments of the invention, since 
they would be useful, for example, in immunoassays, in epitope mapping, epitope tagging, as vaccines, 
and as molecular weight markers. The above fragments may also be used to generate antibodies to a 
particular portion of the polypeptide. These antibodies can then be used in immunoassays well known in 
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the art to distinguish between human and non-human cells and tissues or to determine whether cells or 
tissues in a biological sample are or are not of the same type which express the polypeptides of the 
present invention. 

It is noted that the above species of polypeptide fragments of the present invention may 
5 alternatively be described by the formula "a to b"; where "a" equals the N-terminal most amino acid 
position and "b" equals the C-terminal most amino acid position of the polynucleotide; and further where 
"a" equals an integer between 1 and the number of amino acids of the polypeptide sequence of the present 
invention minus 6, and where "b" equals an integer between 7 and the number of amino acids of the 
polypeptide sequence of the present invention; and where "a" is an integer smaller then "b" by at least 6. 

10 The present invention also provides for the exclusion of any species of polypeptide fragments of 

the present invention specified by 5' and 3' positions or sub-genuses of polypeptides specified by size in 
amino acids as described above. Any number of fragments specified by 5' and 3' positions or by size in 
amino acids, as described above, may be excluded. Specifically excluded from the invention are the 
polypeptide fragments encoded by the preferentially excluded polynucleotide fragments described in 

15 Table IV, and in Tables Va and Vb. Table IV and Tables Va and Vb provide for the exclusion of 
polypeptides, independently from each other, in addition to those described elsewhere in the specification 
and is therefore, not meant as limiting description. 

Functional definition 

Preferred polypeptide fragments of the invention are isolated, purified or recombinant 
20 polypeptides comprising, consisting of, or consisting essentially of signal peptides, preferably signal 
peptides selected from the group consisting of SEQ ID Nos: 242-272 and 274-384, signal peptides 
encoded by sequences of SEQ ID Nos: 1-31 and 33-143 and those encoded by the clone inserts of the 
deposited clone pool. Such polypeptides fragments are useful to design secretion vectors as described 
elsewhere in the application. 
25 Other preferred polypeptide fragments of the invention are isolated, purified or recombinant 

polypeptides comprising, consisting of, or consisting essentially of mature proteins, preferably mature 
proteins selected from the group consisting of SEQ ID Nos: 242-272 and 274-384, mature proteins 
encoded by sequences of SEQ ID Nos: 1-31 and 33-143 and those encoded by the clone inserts of the 
deposited clone pool. 

30 Domains 

Preferred polynucleotide fragments of the invention are domains of polypeptides of the invention. 
Such domains may eventually comprise linear or structural motifs and signatures including, but not 
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limited to, leucine zippers, helix-turn-helix motifs, post-translational modification sites such as 
glycosylation sites, ubiquitination sites, alpha helices, and beta sheets, signal sequences encoding signal 
peptides which direct the secretion of the encoded proteins, sequences implicated in transcription 
regulation such as homeoboxes, acidic stretches, enzymatic active sites, substrate binding sites, and 
5 enzymatic cleavage sites. Such domains may present a particular biological activity such as DNA or 
RNA-binding, secretion of proteins, transcription regulation, enzymatic activity, substrate binding 
activity, etc... 

A domain has a size generally comprised between 3 and 2000 amino acids. In preferred 
embodiment, domains comprise a number of amino acids that is any integer between 6 and 500. Domains 

10 may be synthesized using any methods known to those skilled in the art, including those disclosed herein, 
particularly in the section entitled "Preparation of the polypeptides of the invention". Methods for 
determining the amino acids which make up a domain with a particular biological activity include 
mutagenesis studies and assays to determine the biological activity to be tested. 

Alternatively, the polypeptides of the invention may be scanned for motifs, domains and/or 

15 signatures in databases using any computer method known to those skilled in the art. Searchable 
databases include Prosite (Hofmann et al, 1999; Bucher and Bairoch 1994), Pfam (Sonnhammer et al, 
1997; Henikoff et al, 2000; Bateman et al, 2000), Blocks (Henikoff et al, 2000), Print (Attwood et al, 
1996), Prodom (Sonnhammer and Kahn, 1994; Corpet et al 2000), Sbase (Pongor et al, 1993; Murvai et 
al, 2000), Smart (Schultz et al, 1998), Dali/FSSP (Holm and Sander, 1996, 1997 and 1999), HSSP 

20 (Sander and Schneider 1991), CATH (Orengo et al, 1997; Pearl et al, 2000), SCOP (Murzin et al, 1995; 
Lo Conte et al, 2000), COG (Tatusov et al, 1997 and 2000), specific family databases and derivatives 
thereof (Nevill-Manning et al, 1998; Yona et al, 1999; Attwood et al, 2000), each of which disclosures 
are hereby incorporated by reference in their entireties. For a review on available databases, see issue 1 
of volume 28 of Nucleic Acid Research (2000), which disclosure is hereby incorporated by reference in 

25 its entirety. 

The polypeptides of SEQ ID NOs : 242-482 were screened for the presence of known structural 
or functional motifs or for the presence of signatures, small amino acid sequences that are well conserved 
amongst the members of a protein family. The search was conducted on the Pfam 5.5 database using 
HMMER-2.1.1 (for info see Sonnhammer et Durbin, http:7www.sanger.ac.uk/Pfam/) , on a Blocks Plus 
30 database containing Blocks version 12.0, Prints version 26.0, Pfam version 5.3, Prodom version 99.1, and 
Domo version 2.0 using emotif (for info see Nevill-Manning et al, PNAS, 95, 5865-5871, (1998), 
http ://motif . stanford/edu/EMOTIF ) and on the Prosite 16.0 database using bla (Tatusov, R. L. & Koonin, 
E. V. CABIOS 10, No. 4) and pfscan (http://www.isrec.isb-sib.ch/cgi- 
bin/man.cgi?section=l&topic=pfscan ). Some of these predicted domains are described in Table VI. For 
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these polypeptides referred to by their sequence identification numbers (column entitled "Seq Id No"), 
Table VI gives the designation of the domain (column entitled "Designation of domain") according to the 
database of domains indicated in the column entitled "Database " and the positions of preferred fragments 
within these sequences (column entitled "Positions of domains"). Each fragment is represented by a-b where 
5 a and b are the start and end positions respectively of a given preferred fragment on the full-length 
polypeptide. Preferred fragments are separated from each other by a coma. As used herein, the term " domain 
described in Table VI " refers to all the domains listed in Table VI for a given GENSET protein referred to by 
its sequence identification number in the first column. It should be noted that in Table VI, the first 
methionine encountered is designated as amino acid number 1, i.e;, the leader sequence is not numbered 
10 negatively. In the appended sequence listing, the first amino acid of the mature protein resulting from 
cleavage of the signal peptide is designated as amino acid number 1 and the first amino acid of the signal 
peptide is designated with the appropriate negative number, in accordance with the regulations governing 
sequence listings. 

Consequently, preferred polynucleotide fragments of the invention are domains of the 

15 polypeptides of SEQ ID Nos: 242-482. Therefore, the present invention encompasses isolated, purified, 
or recombinant polypeptides which consist of, consist essentially of, or comprise a contiguous span of at 
least 6, preferably at least 8 to 10, more preferably 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 125, 150, 
175, 200, 225, 250, 275, 300, 350, 400, 450 or 500 amino acids of a sequence selected from the group 
consisting of the sequences of SEQ ID Nos: 242-482, to the extent that a contiguous span of these lengths 

20 is consistent with the lengths of said selected sequence, where said contiguous span comprises at least 1, 
2, 3, 5, or 10 amino acids positions of a domain described in Table VI of said selected sequence. The 
present invention also encompasses isolated, purified, or recombinant polypeptides comprising, consisting 
essentially of, or consisting of a contiguous span of at least 6, preferably at least 8 to 10, more preferably 
12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 350, 400, 450 or 500 

25 amino acids of a sequence selected from the group consisting of the sequences of SEQ ID Nos: 242-482, 
to the extent that a contiguous span of these lengths is consistent with the lengths of said selected 
sequence, where said contiguous span is a domain described in Table VI of said selected sequence. The 
present invention also encompasses isolated, purified, or recombinant polypeptides which comprise, 
consist of or consist essentially of a domain described in Table VI of a sequence selected from the group 

30 consisting of the sequences of SEQ ID Nos: 242-482 . 

Polypeptides of the present invention that are not specifically described in this table are not 
considered as not belonging to a domain. This is because they may still be not recognized as such by the 
particular algorithms used or not be included in the particular database searched. In fact, all fragments of 
the polypeptides of the present invention, at least 6 amino acids residues in length, are included in the 
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present invention as being a domain. Amino acid residues comprising other domains may be determined 
by looking in other databases than the ones currently cited to establish Table VI. The domains of the 
present invention preferably comprises 6 to 200 amino acids (i.e. any integer between 6 and 200, 
inclusive) of a polypeptide of the present invention. Also, included in the present invention are domain 
5 fragments between the integers of 6 and the full length GENSET sequence of the sequence listing. All 
combinations of sequences between the integers of 6 and the full-length sequence of a GENSET 
polypeptide are included. The domain fragments may be specified by either the number of contiguous 
amino acid residues (as a sub-genus) or by specific N-terminal and C-terminal positions (as species) as 
described above for the polypeptide fragments of the present invention. Any number of domain 
10 fragments of the present invention may also be excluded in the same manner. 

Epitopes and Antibody Fusions: 

A preferred embodiment of the present invention is directed to epitope-bearing polypeptides and 
epitope-bearing polypeptide fragments. These epitopes may be " antigenic epitopes " or both an "antigenic 
epitope" and an " immunogenic epitope ". An "immunogenic epitope" is defined as a part of a protein that 

15 elicits an antibody response in vivo when the polypeptide is the immunogen. On the other hand, a region 
of polypeptide to which an antibody binds is defined as an "antigenic determinant" or "antigenic epitope." 
The number of immunogenic epitopes of a protein generally is less than the number of antigenic epitopes 
{See, e.g., Geysen, et al. 9 1984), which disclosure is hereby incorporated by reference in its entirety. It is 
particularly noted that although a particular epitope may not be immunogenic, it is nonetheless useful 

20 since antibodies can be made to both immunogenic and antigenic epitopes. 

An epitope can comprise as few as 3 amino acids in a spatial conformation, which is unique to the 
epitope. Generally an epitope consists of at least 6 such amino acids, and more often at least 8-10 such 
amino acids. In preferred embodiment, antigenic epitopes comprise a number of amino acids that is any 
integer between 3 and 50. Fragments which function as epitopes may be produced by any conventional 

25 means (See, e.g., Houghten, 1985), also further described in U.S. Patent No. 4,631,21, which disclosures 
are hereby incorporated by reference in their entireties. Methods for determining the amino acids which 
make up an epitope include x-ray crystallography, 2-dimensional nuclear magnetic resonance, and epitope 
mapping, e.g., the Pepscan method described by Geysen et al. (1984); PCT Publication No. WO 
84/03564; and PCT Publication No. WO 84/03506, which disclosures are hereby incorporated by 

30 reference in their entireties. Another example is the algorithm of Jameson and Wolf, (1988) (said 
reference incorporated by reference in its entirety). The Jameson-Wolf antigenic analysis, for example, 
may be performed using the computer program PROTEAN, using default parameters (Version 4.0 
Windows, DNASTAR, Inc., 1228 South Park Street Madison, WI. 
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Antigenic epitopes predicted by the Jameson-Wolf algorithm for the polypeptides of SEQ ID 
Nos: 242-482 are presented in Table VII. For each GENSET polypeptide referred to by its sequence 
identification number in the column entitled "Seq Id No", a list of antigenic epitopes is given in the 
column entitled "Epitopes", each epitope being separated by a coma. Each fragment is represented by a-b 
5 where a and b are the start and end positions respectively of a given preferred fragment. It should be noted 
that in Table VII, the first methionine encountered is designated as amino acid number 1, i.e; the leader 
sequence is not numbered negatively. In the appended sequence listing, the first amino acid of the mature 
protein resulting from cleavage of the signal peptide is designated as amino acid number 1 and the first 
amino acid of the signal peptide is designated with the appropriate negative number, in accordance with 

10 the regulations governing sequence listings. As used herein, the term " epitope described in Table VII " 
refers to all preferred polynucleotide fragments described in the second column of Table VII for a GENSET 
polypeptide referred to by its sequence identification number in the first column. It is pointed out that the 
immunogenic epitopes listed in Table VII describe only amino acid residues comprising epitopes 
predicted to have the highest degree of immunogenicity by a particular algorithm. Polypeptides of the 

15 present invention that are not specifically described as immunogenic are not considered non-antigenic. 
This is because they may still be antigenic in vivo but merely not recognized as such by the particular 
algorithm used. Alternatively, the polypeptides are most likely antigenic in vitro using methods such a 
phage display. Thus, listed in Table VII are the amino acid residues comprising only preferred epitopes, 
not a complete list. In fact, all fragments of the polypeptides of the present invention, at least 6 amino 

20 acids residues in length, are included in the present invention as being useful as antigenic epitope. Amino 
acid residues comprising other immunogenic epitopes may be determined by algorithms similar to the 
Jameson-Wolf analysis or by in vivo testing for an antigenic response using the methods described herein 
or those known in the art. 

Therefore, the present invention encompasses isolated, purified, or recombinant polypeptides 

25 which consist of, consist essentially of, or comprise a contiguous span of at least 6, preferably at least 8 to 
10, more preferably 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 
350, 400, 450 or 500 amino acids of a sequence selected from the group consisting of the sequences of 
SEQ ID Nos: 242-482, to the extent that a contiguous span of these lengths is consistent with the lengths 
of said selected sequence, where said contiguous span comprises at least 1, 2, 3, 5, or 10 amino acids 

30 positions of an epitope described in Table VII of said selected sequence. The present invention also 
encompasses isolated, purified, or recombinant polypeptides comprising, consisting essentially of, or 
consisting of a contiguous span of at least 6, preferably at least 8 to 10, more preferably 12, 15, 20, 25, 30, 
35, 40, 50, 60, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 350, 400, 450 or 500 amino acids of a 
sequence selected from the group consisting of the sequences of SEQ ED Nos: 242-482, to the extent that 
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a contiguous span of these lengths is consistent with the lengths of said selected sequence, where said 
contiguous span is an epitope described in Table VII of said selected sequence. The present invention 
also encompasses isolated, purified, or recombinant polypeptides which comprise, consist of or consist 
essentially of an epitope described in Table VII of a sequence selected from the group consisting of the 
5 sequences of SEQ ED Nos: 242-482. 

The epitope-bearing fragments of the present invention preferably comprises 6 to 50 amino acids 
(i.e. any integer between 6 and 50, inclusive) of a polypeptide of the present invention. Also, included in 
the present invention are antigenic fragments between the integers of 6 and the full length GENSET 
sequence of the sequence listing. AH combinations of sequences between the integers of 6 and the full- 
10 length sequence of a GENSET polypeptide are included. The epitope-bearing fragments may be specified 
by either the number of contiguous amino acid residues (as a sub-genus) or by specific N-terminal and C- 
terminal positions (as species) as described above for the polypeptide fragments of the present invention. 
Any number of epitope-bearing fragments of the present invention may also be excluded in the same 
manner. 

15 Antigenic epitopes are useful, for example, to raise antibodies, including monoclonal antibodies 

that specifically bind the epitope (See, Wilson et aL, 1984; and Sutcliffe, et aL, 1983), which disclosures 
are hereby incorporated by reference in their entireties. The antibodies are then used in various 
techniques such as diagnostic and tissue/cell identification techniques, as described herein, and in 
purification methods such as immunoaffinity chromatography. 

20 An antibody or other compound that specifically binds to a polypeptide or polynucleotide of the 

invention is also said to "selectively recognize" the polypeptide or polynucleotide. 

Similarly, immunogenic epitopes can be used to induce antibodies according to methods well 
known in the art (See, Sutcliffe et aL, supra; Wilson et aL, supra; Chow et a/.;(1985) and Bittle, et aL, 
(1985), which disclosures are hereby incorporated by reference in their entireties). A preferred 

25 immunogenic epitope includes the natural GENSET protein. The immunogenic epitopes may be 
presented together with a carrier protein, such as an albumin, to an animal system (such as rabbit or 
mouse) or, if it is long enough (at least about 25 amino acids), without a carrier. However, immunogenic 
epitopes comprising as few as 8 to 10 amino acids have been shown to be sufficient to raise antibodies 
capable of binding to, at the very least, linear epitopes in a denatured polypeptide (e.g., in Western 

30 blotting.). 

Epitope-bearing polypeptides of the present invention are used to induce antibodies according to 
methods well known in the art including, but not limited to, in vivo immunization, in vitro immunization, 
and phage display methods {See, e.g., Sutcliffe, et aL, supra; Wilson, et aL, supra, and Bittle, et aL, 
supra). If in vivo immunization is used, animals may be immunized with free peptide; however, 
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anti-peptide antibody titer may be boosted by coupling of the peptide to a macromolecular carrier, such as 
keyhole limpet hemacyanin (KLH) or tetanus toxoid. For instance, peptides containing cysteine residues 
may be coupled to a carrier using a linker such as -maleimidobenzoyl- N-hydroxysuccinimide ester 
(MBS), while other peptides may be coupled to carriers using a more general linking agent such as 
5 glutaraldehyde. Animals such as rabbits, rats and mice are immunized with either free or carrier-coupled 
peptides, for instance, by intraperitoneal and/or intradermal injection of emulsions containing about 100 
|igs of peptide or carrier protein and Freund's adjuvant. Several booster injections may be needed, for 
instance, at intervals of about two weeks, to provide a useful titer of anti-peptide antibody, which can be 
detected, for example, by ELISA assay using free peptide adsorbed to a solid surface. The titer of 

10 anti-peptide antibodies in serum from an immunized animal may be increased by selection of anti-peptide 
antibodies, for instance, by adsorption to the peptide on a solid support and elution of the selected 
antibodies according to methods well known in the art. 

As one of skill in the art will appreciate, and discussed above, the polypeptides of the present 
invention comprising an immunogenic or antigenic epitope can be fused to heterologous polypeptide 

15 sequences. For example, the polypeptides of the present invention may be fused with the constant domain 
of immunoglobulins (IgA, IgE, IgG, IgM), or portions thereof (CHI, CH2, CH3, any combination thereof 
including both entire domains and portions thereof) resulting in chimeric polypeptides. These fusion 
proteins facilitate purification, and show an increased half-life in vivo. This has been shown, e.g., for 
chimeric proteins consisting of the first two domains of the human CD4-polypeptide and various domains 

20 of the constant regions of the heavy or light chains of mammalian immunoglobulins {See, e.g., EPA 
0,394,827; and Traunecker et aL, 1988), which disclosures are hereby incorporated by reference in their 
entireties. Fusion proteins that have a disulfide-linked dimeric structure due to the IgG portion can also be 
more efficient in binding and neutralizing other molecules than monomeric polypeptides or fragments 
thereof alone (See, e.g., Fountoulakis et aL, 1995), which disclosure is hereby incorporated by reference 

25 in its entirety. Nucleic acids encoding the above epitopes can also be recombined with a gene of interest 
as an epitope tag to aid in detection and purification of the expressed polypeptide. 

Additional fusion proteins of the invention may be generated through the techniques of gene- 
shuffling, motif-shuffling, exon-shuffling, or codon-shuffling (collectively referred to as "DNA 
shuffling"). DNA shuffling may be employed to modulate the activities of polypeptides of the present 

30 invention thereby effectively generating agonists and antagonists of the polypeptides. See, for example, 
U.S. Patent Nos.: 5,605,793; 5,811,238; 5,834,252; 5,837,458; and Patten, et aL, (1997); Harayama, 
(1998); Hansson, et al (1999); and Lorenzo and Blasco, (1998). (Each of these documents are hereby 
incorporated by reference). In one embodiment, one or more components, motifs, sections, parts, 
domains, fragments, etc., of coding polynucleotides of the invention, or the polypeptides encoded thereby 
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may be recombined with one or more components, motifs, sections, parts, domains, fragments, etc. of one 
or more heterologous molecules. 

The present invention further encompasses any combination of the polypeptide fragments listed 
in this section. 

5 Antibodies: 
Definitions 

The present invention further relates to antibodies and T-cell antigen receptors (TCR), which 
specifically bind the polypeptides, and more specifically, the epitopes of the polypeptides of the present 
invention. The antibodies of the present invention include IgG (including IgGl, IgG2, IgG3, and IgG4), 

10 IgA (including IgAl and IgA2), IgD, IgE, or IgM, and IgY. The term " antibody " (Ab) refers to a 
polypeptide or group of polypeptides which are comprised of at least one binding domain, where a 
binding domain is formed from the folding of variable domains of an antibody molecule to form three- 
dimensional binding spaces with an internal surface shape and charge distribution complementary to the 
features of an antigenic determinant of an antigen, which allows an immunological reaction with the 

15 antigen. As used herein, the term "antibody" is meant to include whole antibodies, including single-chain 
whole antibodies, and antigen binding fragments thereof. In a preferred embodiment the antibodies are 
human antigen binding antibody fragments of the present invention include, but are not limited to, Fab, 
Fab' F(ab)2 and F(ab')2, Fd, single-chain Fvs (scFv), single-chain antibodies, disulfide-linked Fvs (sdFv) 
and fragments comprising either a V L or V H domain. The antibodies may be from any animal origin 

20 including birds and mammals. Preferably, the antibodies are human, murine, rabbit, goat, guinea pig, 
camel, horse, or chicken. 

Antigen-binding antibody fragments, including single-chain antibodies, may comprise the 
variable region(s) alone or in combination with the entire or partial of the following: hinge region, CHI, 
CH2, and CH3 domains. Also included in the invention are any combinations of variable region(s) and 

25 hinge region, CHI, CH2, and CH3 domains. The present invention further includes chimeric, humanized, 
and human monoclonal and polyclonal antibodies, which specifically bind the polypeptides of the present 
invention. The present invention further includes antibodies that are anti-idiotypic to the antibodies of the 
present invention. 

The antibodies of the present invention may be monospecific, bispecific, and trispecific or have 
30 greater multispecificity. Multispecific antibodies may be specific for different epitopes of a polypeptide 
of the present invention or may be specific for both a polypeptide of the present invention as well as for 
heterologous compositions, such as a heterologous polypeptide or solid support material. See, e.g., WO 
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93/17715; WO 92/08802; WO 91/00360; WO 92/05793; Tutt, et al. (1991); US Patents 5,573,920, 
4,474,893, 5,601,819, 4,714,681, 4,925,648; Kostelny et al (1992), which disclosures are hereby 
incorporated by reference in their entireties. 

Antibodies of the present invention may be described or specified in terms of the epitope(s) or 
5 epitope-bearing portion(s) of a polypeptide of the present invention, which are recognized or specifically 
bound by the antibody. The antibodies may specifically bind a complete protein encoded by a nucleic acid 
of the present invention, or a fragment thereof, particularly, in the case of secreted proteins the mature protein 
or the signal peptide. Therefore, the epitope(s) or epitope bearing polypeptide portion(s) may be specified 
as described herein, e.g., by N-terminal and C-terminal positions, by size in contiguous amino acid 

10 residues, or otherwise described herein (including the sequence listing). Antibodies which specifically 
bind any epitope or polypeptide of the present invention may also be excluded as individual species. 
Therefore, the present invention includes antibodies that specifically bind specified polypeptides of the 
present invention, and allows for the exclusion of the same. 

Thus, another embodiment of the present invention is a purified or isolated antibody capable of 

15 specifically binding to a polypeptide comprising a sequence selected from the group consisting of the 
sequences of SEQ ID Nos: 242-482 and the sequences of the clone inserts of the deposited clone pool. In 
one aspect of this embodiment, the antibody is capable of binding to an epitope-containing polypeptide 
comprising at least 6 consecutive amino acids, preferably at least 8 to 10 consecutive amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 consecutive amino acids of a sequence selected from 

20 the group consisting of SEQ ID Nos: 242-482 and sequences of the clone inserts of the deposited clone 
pool. 

Antibodies of the present invention may also be described or specified in terms of their cross- 
reactivity. Antibodies that do not specifically bind any other analog, ortholog, or homologue of the 
polypeptides of the present invention are included. Antibodies that do not bind polypeptides with less 

25 than 95%, less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less 
than 60%, less than 55%, and less than 50% identity (as calculated using methods known in the art and 
described herein, e.g., using FASTDB and the parameters set forth herein) to a polypeptide of the present 
invention are also included in the present invention. Further included in the present invention are 
antibodies, which only bind polypeptides encoded by polynucleotides, which hybridize to a 

30 polynucleotide of the present invention under stringent hybridization conditions (as described herein). 
Antibodies of the present invention may also be described or specified in terms of their binding affinity. 
Preferred binding affinities include those with a dissociation constant or Kd less than 5X10" 6 M, 10" 6 M, 
5X10' 7 M, 10" 7 M, 5X10" 8 M, 10 8 M, 5X10" 9 M, 10" 9 M, 5X10 ,0 M, 10 ,0 M, 5X10 n M, 10" U M, 5X10 ,2 M, 10" 
,2 M, 5X10 13 M, 10 13 M, 5X10' 14 M, 10" 14 M, 5X10 15 M, and 10 I5 M. 
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The invention also concerns a purified or isolated antibody capable of specifically binding to a 
mutated GENSET protein or to a fragment or variant thereof comprising an epitope of the mutated 
GENSET protein. 

Preparation of antibodies 

5 The antibodies of the present invention may be prepared by any suitable method known in the art. 

Some of these methods are described in more detail in the example entitled "Preparation of Antibody 
Compositions to 99 . For example, a polypeptide of the present invention or an antigenic fragment thereof 
can be administered to an animal in order to induce the production of sera containing "polyclonal 
antibodies". As used herein, the term " monoclonal antibody " is not limited to antibodies produced 

10 through hybridoma technology but it rather refers to an antibody that is derived from a single clone, 
including eukaryotic, prokaryotic, or phage clone, and not the method by which it is produced. 
Monoclonal antibodies can be prepared using a wide variety of techniques known in the art including the 
use of hybridoma, recombinant, and phage display technology. 

Hybridoma techniques include those known in the art (See, e.g., Harlow et al. 1988; Hammerling, 

15 et al, 1981). (Said references incorporated by reference in their entireties). Fab and F(ab')2 fragments 
may be produced, for example, from hybridoma-produced antibodies by proteolytic cleavage, using 
enzymes such as papain (to produce Fab fragments) or pepsin (to produce F(ab')2 fragments). 

Alternatively, antibodies of the present invention can be produced through the application of 
recombinant DNA technology or through synthetic chemistry using methods known in the art. For 

20 example, the antibodies of the present invention can be prepared using various phage display methods 
known in the art. In phage display methods, functional antibody domains are displayed on the surface of 
a phage particle, which carries polynucleotide sequences encoding them. Phage with a desired binding 
property are selected from a repertoire or combinatorial antibody library (e.g. human or murine) by 
selecting directly with antigen, typically antigen bound or captured to a solid surface or bead. Phage used 

25 in these methods are typically filamentous phage including fd and Ml 3 with Fab, Fv or disulfide 
stabilized Fv antibody domains recombinantly fused to either the phage gene III or gene VIII protein. 
Examples of phage display methods that can be used to make the antibodies of the present invention 
include those disclosed in Brinkman et al (1995); Ames, et al. (1995); Kettleborough, et al. (1994); 
Persic, et al (1997); Burton et al (1994); PCT/GB9 1/01 134; WO 90/02809; WO 91/10737; WO 

30 92/01047; WO 92/18619; WO 93/11236; WO 95/15982; WO 95/20401; and US Patents 5,698,426, 
5,223,409, 5,403,484, 5,580,717, 5,427,908, 5,750,753, 5,821,047, 5,571,698, 5,427,908, 5,516,637, 
5,780,225, 5,658,727 and 5,733,743 (said references incorporated by reference in their entireties). 
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As described in the above references, after phage selection, the antibody coding regions from the 
phage can be isolated and used to generate whole antibodies, including human antibodies, or any other 
desired antigen binding fragment, and expressed in any desired host including mammalian cells, insect 
cells, plant cells, yeast, and bacteria. For example, techniques to recombinantly produce Fab, Fab' F(ab)2 
5 and F(ab')2 fragments can also be employed using methods known in the art such as those disclosed in 
WO 92/22324; Mullinax et al (1992); and Sawai et al (1995); and Better et al (1988) (said references 
incorporated by reference in their entireties). 

Examples of techniques which can be used to produce single-chain Fvs and antibodies include 
those described in U.S. Patents 4,946,778 and 5,258,498; Huston et al (1991); Shu et al (1993); and 

10 Skerra et al (1988), which disclosures are hereby incorporated by reference in their entireties. For some 
uses, including in vivo use of antibodies in humans and in vitro detection assays, it may be preferable to 
use chimeric, humanized, or human antibodies. Methods for producing chimeric antibodies are known in 
the art. See e.g., Morrison, (1985); Oi et al, (1986); Gillies et al (1989); and US Patent 5,807,715, which 
disclosures are hereby incorporated by reference in their entireties. Antibodies can be humanized using a 

15 variety of techniques including CDR-grafting (EP 0 239 400; WO 91/09967; US Patent 5,530,101; and 
5,585,089), veneering or resurfacing, (EP 0 592 106; EP 0 519 596; Padlan, 1991; Studnicka et al, 1994; 
Roguska et al, 1994), and chain shuffling (US Patent 5,565,332), which disclosures are hereby 
incorporated by reference in their entireties. Human antibodies can be made by a variety of methods 
known in the art including phage display methods described above. See also, US Patents 4,444,887, 

20 4,716,111, 5,545,806, and 5,814,318; WO 98/46645; WO 98/50433; WO 98/24893; WO 96/34096; WO 
96/33735; and WO 91/10741 (said references incorporated by reference in their entireties). 

Further included in the present invention are antibodies recombinantly fused or chemically 
conjugated (including both covalently and non-covalently conjugations) to a polypeptide of the present 
invention. The antibodies may be specific for antigens other than polypeptides of the present invention. 

25 For example, antibodies of the present invention may be recombinantly fused or conjugated to molecules 
useful as labels in detection assays and effector molecules such as heterologous polypeptides, drugs, or 
toxins. See, e.g., WO 92/08495; WO 91/14438; WO 89/12624; US Patent 5,314,995; and EP 0 396 387, 
which disclosures are hereby incorporated by reference in their entireties. Fused antibodies may also be 
used to target the polypeptides of the present invention to particular cell types, either in vitro or in vivo, 

30 by fusing or conjugating the polypeptides of the present invention to antibodies specific for particular cell 
surface receptors. Antibodies fused or conjugated to the polypeptides of the present invention may also 
be used in vitro immunoassays and purification methods using methods known in the art {See e.g., Harbor 
et al supra; WO 93/21232; EP 0 439 095; Naramura, M. et al 1994; US Patent 5,474,981; Gillies et al, 
1992; Fell et al, 1991) (said references incorporated by reference in their entireties). 
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The present invention further includes compositions comprising the polypeptides of the present 
invention fused or conjugated to antibody domains other than the variable regions. For example, the 
polypeptides of the present invention may be fused or conjugated to an antibody Fc region, or portion 
thereof. The antibody portion fused to a polypeptide of the present invention may comprise the hinge 
5 region, CHI domain, CH2 domain, and CH3 domain or any combination of whole domains or portions 
thereof. The polypeptides of the present invention may be fused or conjugated to the above antibody 
portions to increase the in vivo half-life of the polypeptides or for use in immunoassays using methods 
known in the art. The polypeptides may also be fused or conjugated to the above antibody portions to 
form multimers. For example, Fc portions fused to the polypeptides of the present invention can form 

10 dimers through disulfide bonding between the Fc portions. Higher multimeric forms can be made by 
fusing the polypeptides to portions of IgA and IgM. Methods for fusing or conjugating the polypeptides 
of the present invention to antibody portions are known in the art. See e.g., US Patents 5,336,603, 
5,622,929, 5,359,046, 5,349,053, 5,447,851, 5,112,946; EP 0 307 434, EP 0 367 166; WO 96/04388, WO 
91/06570; Ashkenazi et al (1991); Zheng et al. (1995); and Vil et al (1992) (said references incorporated 

1 5 by reference in their entireties). 

Non-human animals or mammals, whether wild-type or transgenic, which express a different 
species of GENSET than the one to which antibody binding is desired, and animals which do not express 
GENSET (i.e. a GENSET knock out animal as described herein) are particularly useful for preparing 
antibodies. GENSET knock out animals will recognize all or most of the exposed regions of a GENSET 

20 protein as foreign antigens, and therefore produce antibodies with a wider array of GENSET epitopes. 
Moreover, smaller polypeptides with only 10 to 30 amino acids may be useful in obtaining specific 
binding to any one of the GENSET proteins . In addition, the humoral immune system of animals which 
produce a species of GENSET that resembles the antigenic sequence will preferentially recognize the 
differences between the animal's native GENSET species and the antigen sequence, and produce 

25 antibodies to these unique sites in the antigen sequence. Such a technique will be particularly useful in 
obtaining antibodies that specifically bind to any one of the GENSET proteins . 

The antibodies of the invention may be labeled by any one of the radioactive, fluorescent or 
enzymatic labels known in the art. 

Uses of polynucleotides 

30 Uses of polynucleotides as reagents 

The polynucleotides of the present invention, particularly those described in the "Oligonucleotide 
primers and probes" section, may be used as reagents in isolation procedures, diagnostic assays, and 
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forensic procedures. For example, sequences from the GENSET polynucleotides of the invention may be 
detectably labeled and used as probes to isolate other sequences capable of hybridizing to them. In 
addition, sequences from the GENSET polynucleotides of the invention may be used to design PCR 
primers to be used in isolation, diagnostic, or forensic procedures. 

5 In forensic analyses 

PCR primers may be used in forensic analyses, such as the DNA fingerprinting techniques 
described below. Such analyses may utilize detectable probes or primers based on the sequences of the 
polynucleotides of the invention. Consequently, the present invention encompasses methods of 
identification of an individual using the polynucleotides of the invention in forensic analyses, wherein 
10 said method includes the steps of: 

a) obtaining a biological sample containing nucleic acid material from an individual; 

b) obtaining an identification pattern for this individual using the polynucleotides of the 
invention, particularly using GENSET primers and probes; 

c) comparing said identification pattern with a reference identification pattern; and 

15 d) determining whether said identification pattern is identical to said reference identification 

pattern. 

In one embodiment of this method, the identification pattern consists in sequences of amplicons 
obtained using GENSET primers as explained in the sections entitled "Forensic Matching by DNA 
Sequencing" and "Positive Identification by DNA Sequencing". 
20 In another embodiment, the identification pattern consists in unique band or dot patterns obtained 

using any method described in the sections entitled "Southern Blot Forensic Identification", "Dot Blot 
Identification Procedure" and "Alternative "Fingerprint" Identification Technique". 

Forensic Matching by DNA Sequencing 

In one exemplary method, DNA samples are isolated from forensic specimens of, for example, 
25 hair, semen, blood or skin cells by conventional methods. A panel of PCR primers designed from 
different polynucleotides of the invention using any technique known to those skilled in the art including 
those described herein, is then utilized to amplify DNA of approximately 100-200 bases in length from 
the forensic specimen. Corresponding sequences are obtained from a test subject. Each of these 
identification DNAs is then sequenced using standard techniques, and a simple database comparison 
30 determines the differences, if any, between the sequences from the subject and those from the sample. 
Statistically significant differences between the suspect's DNA sequences and those from the sample 
conclusively prove a lack of identity. This lack of identity can be proven, for example, with only one 
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sequence. Identity, on the other hand, should be demonstrated with a large number of sequences, all 
matching. Preferably, a minimum of 50 statistically identical sequences of 100 bases in length are used to 
prove identity between the suspect and the sample. 

Positive Identification by DNA Sequencing 

5 The "Forensic Matching by DNA Sequencing" technique described herein may also be used on a 

larger scale to provide a unique fingerprint-type identification of any individual. In this technique, 
primers are prepared from a large number of polynucleotides of the invention. Preferably, 20 to 50 
different primers are used. These primers are used to obtain a corresponding number of PCR-generated 
DNA segments from the individual in question. Each of these DNA segments is sequenced. The 
10 database of sequences generated through this procedure uniquely identifies the individual from whom the 
sequences were obtained. The same panel of primers may then be used at any later time to absolutely 
correlate tissue or other biological specimen with that individual. 

Southern Blot Forensic Identification 

The "Positive Identification by DNA Sequencing" procedure described herein is repeated to 

15 obtain a panel of at least 10 amplified sequences from an individual and a specimen. Preferably, the panel 
contains at least 50 amplified sequences. More preferably, the panel contains 100 amplified sequences. 
In some embodiments, the panel contains 200 amplified sequences. This PCR-generated DNA is then 
digested with one or a combination of, preferably, four base specific restriction enzymes. Such enzymes 
are commercially available and known to those of skill in the art. After digestion, the resultant gene 

20 fragments are size separated in multiple duplicate wells on an agarose gel and transferred to nitrocellulose 
using Southern blotting techniques well known to those with skill in the art. For a review of Southern 
blotting see Davis et al. (1986), which disclosure is hereby incorporated by reference in its entirety. 

A panel of probes based on the sequences of the polynucleotides of the invention, or fragments 
thereof of at least 10 bases, are radioactively or colorimetrically labeled using methods known in the art, 

25 such as nick translation or end labeling, and hybridized to the Southern blot using techniques known in 
the art. Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the 
polynucleotide of the invention. More preferably, the probe comprises at least 20-30 consecutive 
nucleotides from the polynucleotide of the invention. In some embodiments, the probe comprises more 
than 30 nucleotides from the polynucleotide of the invention. In other embodiments, the probe comprises 

30 at least 40, at least 50, at least 75, at least 100, at least 150, or at least 200 consecutive nucleotides from 
the polynucleotide of the invention. 
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Preferably, at least 5 to 10 of these labeled probes are used, and more preferably at least about 20 
or 30 are used to provide a unique pattern. The resultant bands appearing from the hybridization of a 
large sample of polynucleotide of the invention will be a unique identifier. Since the restriction enzyme 
cleavage will be different for every individual, the band pattern on the Southern blot will also be unique. 
5 Increasing the number of cDNA probes will provide a statistically higher level of confidence in the 
identification since there will be an increased number of sets of bands used for identification. 

Dot Blot Identification Procedure 

Another technique for identifying individuals using the polynucleotide sequences disclosed herein 
utilizes a dot blot hybridization technique. 

10 Genomic DNA is isolated from nuclei of subject to be identified. Oligonucleotide probes of 

approximately 30 bp in length are synthesized that correspond to at least 10, preferably 50 sequences 
from the polynucleotide of the invention. The probes are used to hybridize to the genomic DNA through 
conditions known to those in the art. The oligonucleotides are end labeled with P 32 using polynucleotide 
kinase (Pharmacia). Dot Blots are created by spotting the genomic DNA onto nitrocellulose or the like 

15 using a vacuum dot blot manifold (BioRad, Richmond California). The nitrocellulose filter containing 
the genomic sequences is baked or UV linked to the filter, prehybridized and hybridized with labeled 
probe using techniques known in the art (Davis et al 1986). The 32 P labeled DNA fragments are 
sequentially hybridized with successively stringent conditions to detect minimal differences between the 
30 bp sequence and the DNA. Tetramethylammonium chloride is useful for identifying clones containing 

20 small numbers of nucleotide mismatches (Wood et al, 1985). A unique pattern of dots distinguishes one 
individual from another individual. 

Alternative "Fingerprint" Identification Technique 

In a representative alternative fingerprinting procedure, the probes are derived from cDNAs. 
Preferably, a plurality of probes having sequences from different genes are used as follows. 

25 Polynucleotides containing at least 10 consecutive bases from these sequences can be used as probes. 
Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the polynucleotide of 
the invention. More preferably, the probe comprises at least 20-30 consecutive nucleotides from the 
polynucleotide of the invention. In some embodiments, the probe comprises more than 30 nucleotides 
from the polynucleotide of the invention. In other embodiments, the probe comprises at least 40, at least 

30 50, at least 75, at least 100, at least 150, or at least 200 consecutive nucleotides from the polynucleotide of 
the invention. 
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Oligonucleotides, generally 20-mers, are prepared from a large number, e.g. 50, 100, or 200, of 
polynucleotides of the invention using commercially available oligonucleotide services such as Genset, 
Paris, France. Cell samples from the test subject are processed for DNA using techniques well known to 
those with skill in the art. The nucleic acid is digested with restriction enzymes such as EcoRI and XbaL 
5 Following digestion, samples are applied to wells for electrophoresis. The procedure, as known in the art, 
may be modified to accommodate polyacrylamide electrophoresis, however in this example, samples 
containing 5 ug of DNA are loaded into wells and separated on 0.8% agarose gels. The gels are 
transferred onto nitrocellulose using standard Southern blotting techniques. 

10 ng of each of the oligonucleotides are pooled and end-labeled with P 32 . The nitrocellulose is 
10 prehybridized with blocking solution and hybridized with the labeled probes. Following hybridization 
and washing, the nitrocellulose filter is exposed to X-Omat AR X-ray film. The resulting hybridization 
pattern will be unique for each individual. 

It is additionally contemplated within this example that the number of probe sequences used can 
be varied for additional accuracy or clarity. 

15 To find corresponding genomic DNA sequences 

The GENSET cDNAs of the invention may also be used to clone sequences located upstream of 
the cDNAs of the invention on the corresponding genomic DNA. Such upstream sequences may be 
capable of regulating gene expression, including promoter sequences, enhancer sequences, and other 
upstream sequences which influence transcription or translation levels. Once identified and cloned, these 

20 upstream regulatory sequences may be used in expression vectors designed to direct the expression of an 
inserted gene in a desired spatial, temporal, developmental, or quantitative fashion. 

Use of cDNAs or Fragments thereof to Clone Upstream Sequences from Genomic DNA 

Sequences derived from polynucleotides of the inventions may be used to isolate the promoters of 
the corresponding genes using chromosome walking techniques. In one chromosome walking technique, 
25 which utilizes the Genome Walker™ kit available from Clontech, five complete genomic DNA samples 
are each digested with a different restriction enzyme which has a 6 base recognition site and leaves a 
blunt end. Following digestion, oligonucleotide adapters are ligated to each end of the resulting genomic 
DNA fragments. 

For each of the five genomic DNA libraries, a first PCR reaction is performed according to the 
30 manufacturer's instructions (which are incorporated herein by reference) using an outer adaptor primer 
provided in the kit and an outer gene specific primer. The gene specific primer should be selected to be 
specific for the polynucleotide of the invention of interest and should have a melting temperature, length, 
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and location in the polynucleotide of the invention which is consistent with its use in PCR reactions. Each 
first PCR reaction contains 5ng of genomic DNA, 5 (il of 10X Tth reaction buffer, 0.2 mM of each dNTP, 
0.2 nM each of outer adaptor primer and outer gene specific primer, 1.1 mM of Mg(OAc) 2 , and 1 ^1 of 
the Tth polymerase SOX mix in a total volume of 50 pi. The reaction cycle for the first PCR reaction is as 
5 follows: 1 min at 94 degree Celsius / 2 sec at 94 degree Celsius, 3 min at 72 degree Celsius (7 cycles) / 2 
sec at 94 degree Celsius, 3 min at 67 degree Celsius (32 cycles) / 5 min at 67 degree Celsius. 

The product of the first PCR reaction is diluted and used as a template for a second PCR reaction 
according to the manufacturer's instructions using a pair of nested primers which are located internally on 
the amplicon resulting from the first PCR reaction. For example, 5 (il of the reaction product of the first 

10 PCR reaction mixture may be diluted 180 times. Reactions are made in a 50 |il volume having a 
composition identical to that of the first PCR reaction except the nested primers are used. The first nested 
primer is specific for the adaptor, and is provided with the Genome Walker™ kit. The second nested 
primer is specific for the particular polynucleotide of the invention for which the promoter is to be cloned 
and should have a melting temperature, length, and location in the polynucleotide of the invention which 

15 is consistent with its use in PCR reactions. The reaction parameters of the second PCR reaction are as 
follows: 1 min at 94 degree Celsius / 2 sec at 94 degree Celsius, 3 min at 72 degree Celsius (6 cycles) / 2 
sec at 94 degree Celsius, 3 min at 67 degree Celsius (25 cycles) / 5 min at 67 degree Celsius 

The product of the second PCR reaction is purified, cloned, and sequenced using standard 
techniques. Alternatively, two or more human genomic DNA libraries can be constructed by using two or 

20 more restriction enzymes. The digested genomic DNA is cloned into vectors which can be converted into 
single stranded, circular, or linear DNA. A biotinylated oligonucleotide comprising at least 15 
nucleotides from the polynucleotide of the invention sequence is hybridized to the single stranded DNA. 
Hybrids between the biotinylated oligonucleotide and the single stranded DNA containing the 
polynucleotide of the invention sequence are isolated as described herein. Thereafter, the single stranded 

25 DNA containing the polynucleotide of the invention sequence is released from the beads and converted 
into double stranded DNA using a primer specific for the polynucleotide of the invention sequence or a 
primer corresponding to a sequence included in the cloning vector. The resulting double stranded DNA is 
transformed into bacteria. DNAs containing the GENSET polynucleotide sequences are identified by 
colony PCR or colony hybridization. 

30 Identification of Promoters in Cloned Upstream Sequences 

Once the upstream genomic sequences have been cloned and sequenced as described above, 
prospective promoters and transcription start sites within the upstream sequences may be identified by 
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comparing the sequences upstream of the polynucleotides of the inventions with databases containing 
known transcription start sites, transcription factor binding sites, or promoter sequences. 

In addition, promoters in the upstream sequences may be identified using promoter reporter 
vectors as follows. The expression of the reporter gene will be detected when placed under the control of 
5 regulatory active polynucleotide fragments or variants of the GENSET promoter region located upstream 
of the first exon of the GENSET gene. Suitable promoter reporter vectors, into which the GENSET 
promoter sequences may be cloned include pSEAP-Basic, pSEAP-Enhancer, p(3gal-Basic, pPgal- 
Enhancer, or pEGFP-1 Promoter Reporter vectors available from Clontech, or pGL2 -basic or pGL3-basic 
promoterless luciferase reporter gene vector from Promega. Briefly, each of these promoter reporter 

10 vectors include multiple cloning sites positioned upstream of a reporter gene encoding a readily assayable 
protein such as secreted alkaline phosphatase, luciferase, beta-galactosidase, or green fluorescent protein. 
The sequences upstream the GENSET- coding region are inserted into the cloning sites upstream of the 
reporter gene in both orientations and introduced into an appropriate host cell. The level of reporter 
protein is assayed and compared to the level obtained from a vector which lacks an insert in the cloning 

15 site. The presence of an elevated expression level in the vector containing the insert with respect to the 
control vector indicates the presence of a promoter in the insert. If necessary, the upstream sequences can 
be cloned into vectors which contain an enhancer for increasing transcription levels from weak promoter 
sequences. A significant level of expression above that observed with the vector lacking an insert 
indicates that a promoter sequence is present in the inserted upstream sequence. 

20 Promoter sequence within the upstream genomic DNA may be further defined by site directed 

mutagenesis, linker scanning analysis, or other techniques familiar to those skilled in the art. For 
example, the boundaries of promoters may be further investigated by constructing nested 5' and/or 3 5 
deletions in the upstream DNA using conventional techniques such as Exonuclease III or appropriate 
restriction endonuclease digestion. The resulting deletion fragments can be inserted into the promoter 

25 reporter vector to determine whether the deletion has increased, reduced or illuminated promoter activity, 
such as described, for example, by Coles et al. (1998), the disclosure of which is incorporated herein by 
reference in its entirety. In this way, the boundaries of the promoters may be defined. If desired, 
potential individual regulatory sites within the promoter may be identified using site directed mutagenesis 
or linker scanning to obliterate potential transcription factor binding sites within the promoter individually 

30 or in combination. The effects of these mutations on transcription levels may be determined by inserting 
the mutations into cloning sites in promoter reporter vectors. This type of assay is well known to those 
skilled in the art and is described in WO 97/17359, US Patent No. 5,374,544; EP 582 796; US Patent No. 
5,698,389; US 5,643,746; US Patent No. 5,502,176; and US Patent 5,266,488; the disclosures of which 
are incorporated by reference herein in their entirety. 
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The strength and the specificity of the promoter of each GENSET gene can be assessed through 
the expression levels of a detectable polynucleotide operably linked to the GENSET promoter in different 
types of cells and tissues. The detectable polynucleotide may be either a polynucleotide that specifically 
hybridizes with a predefined oligonucleotide probe, or a polynucleotide encoding a detectable protein, 
5 including a GENSET polypeptide or a fragment or a variant thereof. This type of assay is well known to 
those skilled in the art and is described in US Patent No. 5,502,176; and US Patent No. 5,266,488; the 
disclosures of which are incorporated by reference herein in their entirety. Some of the methods are 
discussed in more detail elsewhere in the application. 

The promoters and other regulatory sequences located upstream of the polynucleotides of the 

10 inventions may be used to design expression vectors capable of directing the expression of an inserted 
gene in a desired spatial, temporal, developmental, or quantitative manner. A promoter capable of 
directing the desired spatial, temporal, developmental, and quantitative patterns may be selected using the 
results of the expression analysis described herein. For example, if a promoter which confers a high level 
of expression in muscle is desired, the promoter sequence upstream of a polynucleotide of the invention 

15 derived from an mRNA which is expressed at a high level in muscle may be used in the expression 
vector. Such vectors are described in more detail elsewhere in the application. 

Preferably, the desired promoter is placed near multiple restriction sites to facilitate the cloning of 
the desired insert downstream of the promoter, such that the promoter is able to drive expression of the 
inserted gene. The promoter may be inserted in conventional nucleic acid, backbones designed for 

20 extrachromosomal replication, integration into the host chromosomes or transient expression. Suitable 
backbones for the present expression vectors include retroviral backbones, backbones from eukaryotic 
episomes such as SV40 or Bovine Papilloma Virus, backbones from bacterial episomes, or artificial 
chromosomes. 

Preferably, the expression vectors also include a polyA signal downstream of the multiple 
25 restriction sites for directing the polyadenylation of mRNA transcribed from the gene inserted into the 
expression vector. 

To find similar sequences 

Polynucleotides of the invention may be used to isolate and/or purify nucleic acids similar thereto 
using any methods well known to those skilled in the art including the techniques based on hybridization 
30 or on amplification described in this section. These methods may be used to obtain the genomic DNAs 
which encode the mRNAs from which the GENSET cDNAs are derived, mRNAs corresponding to 
GENSET cDNAs, or nucleic acids which are homologous to GENSET cDNAs or fragments thereof, such 
as variants, species homologues or orthologs. Thus, a plurality of cDNAs similar to GENSET 
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polynucleotides may be provided as cDNA libraries for subsequent evaluation of the encoded proteins or 
use in diagnostic assays as described herein. cDNAs prepared by any method described therein may be 
subsequently engineered to obtain nucleic acids which include desired fragments of the cDNA using 
conventional techniques such as subcloning, PCR, or in vitro oligonucleotide synthesis. For example, 
5 nucleic acids which include only the coding sequences may be obtained using techniques known to those 
skilled in the art. Similarly, nucleic acids containing any other desired fragment of the coding sequences 
for the encoded protein may be obtained. 

Indeed, cDNAs of the present invention or fragments thereof may be used to isolate nucleic acids 
similar to cDNAs from a cDNA library or a genomic DNA library. Such cDNA libraries or genomic 
10 DNA libraries may be obtained from a commercial source or made using techniques familiar to those 
skilled in the art such as those described in PCT publication WO 00/37491, which disclosure is hereby 
incorporated by reference in its entirety. Examples of methods for obtaining nucleic acids similar to 
GENSET polynucleotides are described below. 

Hybridization-based methods 

15 Techniques for identifying cDNA clones in a cDNA library which hybridize to a given probe 

sequence are disclosed in Sambrook et al, (1989) and in Hames and Higgins (1985), the disclosures of 
which are incorporated herein by reference in their entireties. The same techniques may be used to isolate 
genomic DNAs. 

Briefly, cDNA or genomic DNA clones which hybridize to the detectable probe are identified and 
20 isolated for further manipulation as follows. Any polynucleotide fragment of the invention may be used as 
a probe, in particular those defined in the "Oligonucleotide primers and probes 5 ' section. A probe 
comprising at least 10 consecutive nucleotides from a GENSET cDNA or fragment thereof is labeled with 
a detectable label such as a radioisotope or a fluorescent molecule. Preferably, the probe comprises at 
least 12, 15, or 17 consecutive nucleotides from the cDNA or fragment thereof. More preferably, the 
25 probe comprises 20 to 30 consecutive nucleotides from the cDNA or fragment thereof. In some 
embodiments, the probe comprises more than 30 nucleotides from the cDNA or fragment thereof. 

Techniques for labeling the probe are well known and include phosphorylation with 
polynucleotide kinase, nick translation, in vitro transcription, and non radioactive techniques. The 
cDNAs or genomic DNAs in the library are transferred to a nitrocellulose or nylon filter and denatured. 
30 After blocking of non specific sites, the filter is incubated with the labeled probe for an amount of time 
sufficient to allow binding of the probe to cDNAs or genomic DNAs containing a sequence capable of 
hybridizing thereto. 
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By varying the stringency of the hybridization conditions used to identify cDNAs or genomic 
DNAs which hybridize to the detectable probe, cDNAs or genomic DNAs having different levels of 
identity to the probe can be identified and isolated as described below. 

Stringent conditions 

5 " Stringent hybridization conditions " are defined as conditions in which only nucleic acids having 

a high level of identity to the probe are able to hybridize to said probe. These conditions may be 
calculated as follows: 

For probes between 14 and 70 nucleotides in length the melting temperature (Tm) is calculated 
using the formula: Tm=81.5+16.6(log (Na+))+0 .41 (fraction G+C)-(600/N) where N is the length of the 
10 probe. 

If the hybridization is carried out in a solution containing formamide, the melting temperature 
may be calculated using the equation: Tm=81.5+16.6(log (Na+))+0.41(fraction G+C)-(0.63% 
formamide)-(600/N) where N is the length of the probe. 

Prehybridization may be carried out in 6X SSC, 5X Denhardt's reagent, 0.5% SDS, 100 \ig 
15 denatured fragmented salmon sperm DNA or 6X SSC, 5X Denhardt's reagent, 0.5% SDS, 100 ^g 
denatured fragmented salmon sperm DNA, 50% formamide. The formulas for SSC and Denhardt's 
solutions are listed in Sambrook et al, 1986. 

Hybridization is conducted by adding the detectable probe to the prehybridization solutions listed 
above. Where the probe comprises double stranded DNA, it is denatured before addition to the 
20 hybridization solution. The filter is contacted with the hybridization solution for a sufficient period of 
time to allow the probe to hybridize to nucleic acids containing sequences complementary thereto or 
homologous thereto. For probes over 200 nucleotides in length, the hybridization may be carried out at 
15-25°C below the Tm. For shorter probes, such as oligonucleotide probes, the hybridization may be 
conducted at 15-25°C below the Tm. Preferably, for hybridizations in 6X SSC, the hybridization is 
25 conducted at approximately 68°C. Preferably, for hybridizations in 50% formamide containing solutions, 
the hybridization is conducted at approximately 42°C. 

Following hybridization, the filter is washed in 2X SSC, 0.1% SDS at room temperature for 15 
minutes. The filter is then washed with 0.1X SSC, 0.5% SDS at room temperature for 30 minutes to 1 
hour. Thereafter, the solution is washed at the hybridization temperature in 0.1X SSC, 0.5% SDS. A 
30 final wash is conducted in 0. IX SSC at room temperature. 

Nucleic acids which have hybridized to the probe are identified by autoradiography or other 
conventional techniques. 
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Low and moderate conditions 

Changes in the stringency of hybridization and signal detection are primarily accomplished 
through the manipulation of formamide concentration (lower percentages of formamide result in lowered 
stringency); salt conditions, or temperature. The above procedure may thus be modified to identify 
5 nucleic acids having decreasing levels of identity to the probe sequence. For example, the hybridization 
temperature may be decreased in increments of 5°C from 68°C to 42°C in a hybridization buffer having a 
sodium concentration of approximately 1M. Following hybridization, the filter may be washed with 2X 
SSC, 0.5% SDS at the temperature of hybridization. These conditions are considered to be "moderate" 
conditions above 50°C and "low" conditions below 50°C. Alternatively, the hybridization may be carried 

10 out in buffers, such as 6X SSC, containing formamide at a temperature of 42°C. In this case, the 
concentration of formamide in the hybridization buffer may be reduced in 5% increments from 50% to 
0% to identify clones having decreasing levels of identity to the probe. Following hybridization, the filter 
may be washed with 6X SSC, 0.5% SDS at 50°C. These conditions are considered to be "moderate" 
conditions above 25% formamide and "low" conditions below 25% formamide. cDNAs or genomic 

15 DNAs which have hybridized to the probe are identified by autoradiography or other conventional 
techniques. 

Note that variations in the above conditions may be accomplished through the inclusion and/or 
substitution of alternate blocking reagents used to suppress background in hybridization experiments. 
Typical blocking reagents include Denhardfs reagent, BLOTTO, heparin, denatured salmon sperm DNA, 
20 and commercially available proprietary formulations. The inclusion of specific blocking reagents may 
require modification of the hybridization conditions described above, due to problems with compatibility. 

Consequently, the present invention encompasses methods of isolating nucleic acids similar to the 
polynucleotides of the invention, comprising the steps of: 

a) contacting a collection of cDNA or genomic DNA molecules with a detectable probe 
25 comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40 or 50 consecutive nucleotides of a sequence 
selected from the group consisting of the sequences of SEQ ID Nos: 1-241, the sequences of clones 
inserts of the deposited clone pool and sequences complementary thereto under stringent, moderate or low 
conditions which permit said probe to hybridize to at least a cDNA or genomic DNA molecule in said 
collection; 

30 b) identifying said cDNA or genomic DNA molecule which hybridizes to said detectable probe; 

and 

c) isolating said cDNA or genomic DNA molecule which hybridized to said probe. 
PCR-based methods 
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In addition to the above described methods, other protocols are available to obtain homologous 
cDNAs using GENSET cDNA of the present invention or fragment thereof as outlined in the following 
paragraphs. 

cDNAs may be prepared by obtaining mRNA from the tissue, cell, or organism of interest using 
5 mRNA preparation procedures utilizing polyA selection procedures or other techniques known to those 
skilled in the art. A first primer capable of hybridizing to the polyA tail of the mRNA is hybridized to the 
mRNA and a reverse transcription reaction is performed to generate a first cDNA strand. 

The term " capable of hybridizing to the polyA tail of said mRNA " refers to and embraces all 
primers containing stretches of thymidine residues, so-called oligo(dT) primers, that hybridize to the V 

10 end of eukaryotic poly(A)+ mRNAs to prime the synthesis of a first cDNA strand. Techniques for 
generating said oligo (dT) primers and hybridizing them to mRNA to subsequently prime the reverse 
transcription of said hybridized mRNA to generate a first cDNA strand are well known to those skilled in 
the art and are described in Current Protocols in Molecular Biology, John Wiley and Sons, Inc. 1997 and 
Sambrook et al., 1989. Preferably, said oligo (dT) primers are present in a large excess in order to allow 

15 the hybridization of all mRNA 3 'ends to at least one oligo (dT) molecule. The priming and reverse 
transcription steps are preferably performed between 37°C and 55°C depending on the type of reverse 
transcriptase used. Preferred oligo(dT) primers for priming reverse transcription of mRNAs are 
oligonucleotides containing a stretch of thymidine residues of sufficient length to hybridize specifically to 
the polyA tail of mRNAs, preferably of 12 to 18 thymidine residues in length. More preferably, such 

20 oligo(T) primers comprise an additional sequence upstream of the poly(dT) stretch in order to allow the 
addition of a given sequence to the 5 'end of all first cDNA strands which may then be used to facilitate 
subsequent manipulation of the cDNA. Preferably, this added sequence is 8 to 60 residues in length. For 
instance, the addition of a restriction site in 5' of cDNAs facilitates subcloning of the obtained cDNA. 
Alternatively, such an added 5'end may also be used to design primers of PCR to specifically amplify 

25 cDNA clones of interest. 

The first cDNA strand is then hybridized to a second primer. Any polynucleotide fragment of the 
invention may be used, and in particular those described in the "Oligonucleotide primers and probes" section. 
This second primer contains at least 10 consecutive nucleotides of a polynucleotide of the invention. 
Preferably, the primer comprises at least 10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucleotides of a 

30 polynucleotide of the invention. In some embodiments, the primer comprises more than 30 nucleotides of 
a polynucleotide of the invention. If it is desired to obtain cDNAs containing the full protein coding 
sequence, including the authentic translation initiation site, the second primer used contains sequences 
located upstream of the translation initiation site. The second primer is extended to generate a second 
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cDNA strand complementary to the first cDNA strand. Alternatively, RT-PCR may be performed as 
described above using primers from both ends of the cDNA to be obtained. 

The double stranded cDNAs made using the methods described above are isolated and cloned. 
The cDNAs may be cloned into vectors such as plasmids or viral vectors capable of replicating in an 
5 appropriate host cell. For example, the host cell may be a bacterial, mammalian, avian, or insect cell. 

Techniques for isolating mRNA, reverse transcribing a primer hybridized to mRNA to generate a 
first cDNA strand, extending a primer to make a second cDNA strand complementary to the first cDNA 
strand, isolating the double stranded cDNA and cloning the double stranded cDNA are well known to 
those skilled in the art and are described in Current Protocols in Molecular Biology, John Wiley & Sons, 
10 Inc. 1997 and Sambrook et ai, 1989. 

Consequently, the present invention encompasses methods of making cDNAs. In a first 
embodiment, the method of making a cDNA comprises the steps of 

a) contacting a collection of mRNA molecules from human cells with a primer comprising at least 
12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group 

15 consisting of the sequences complementary to SEQ ID Nos: 1-241 and sequences complementary to a 
clone insert of the deposited clone pool; 

b) hybridizing said primer to an mRNA in said collection; 

c) reverse transcribing said hybridized primer to make a first cDNA strand from said mRNA; 

d) making a second cDNA strand complementary to said first cDNA strand; and 

20 e) isolating the resulting cDNA comprising said first cDNA strand and said second cDNA strand. 

Another embodiment of the present invention is a purified cDNA obtainable by the method of the 
preceding paragraph. In one aspect of this embodiment, the cDNA encodes at least a portion of a human 
polypeptide. 

In a second embodiment, the method of making a cDNA comprises the steps of 
25 a) contacting a collection of mRNA molecules from human cells with a first primer capable of 

hybridizing to the polyA tail of said mRNA; 

b) hybridizing said first primer to said polyA tail; 

c) reverse transcribing said mRNA to make a first cDNA strand; 

d) making a second cDNA strand complementary to said first cDNA strand using at least one 
30 primer comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a 

sequence selected from the group consisting of SEQ ID Nos: 1-241 and sequences of clone inserts of the 
deposited clone pool; and 

e) isolating the resulting cDNA comprising said first cDNA strand and said second cDNA strand. 
In another aspect of this method the second cDNA strand is made by 
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a) contacting said first cDNA strand with a second primer comprising at least 12, 15, 18, 20, 23, 
25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group consisting of SEQ 
ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool, and a third primer which 
sequence is fully included within the sequence of said first primer; 

5 b) performing a first polymerase chain reaction with said second and third primers to generate a 

first PCR product; 

c) contacting said first PCR product with a fourth primer, comprising at least 12, 15, 18, 20, 23, 
25, 28, 30, 35, 40, or 50 consecutive nucleotides of said sequence selected from the group consisting of 
SEQ ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool, and a fifth primer, which 

10 sequence is fully included within the sequence of said third primer, wherein said fourth and fifth 
hybridize to sequences within said first PCR product; and 

d) performing a second polymerase chain reaction, thereby generating a second PCR product. 
Alternatively, the second cDNA strand may be made by contacting said first cDNA strand with a 

second primer comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a 
15 sequence selected from the group consisting of SEQ ID Nos: 1-241 and sequences of clone inserts of the 

deposited clone pool, and a third primer which sequence is fully included within the sequence of said first 

primer and performing a polymerase chain reaction with said second and third primers to generate said 

second cDNA strand. 

Alternatively, the second cDNA strand may be made by 
20 a) contacting said first cDNA strand with a second primer comprising at least 12, 15, 18, 20, 23, 

25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group consisting of SEQ 

ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool; 

b) hybridizing said second primer to said first strand cDNA; and 

c) extending said hybridized second primer to generate said second cDNA strand. 

25 Another embodiment of the present invention is a purified cDNA obtainable by a method of 

making a cDNA of the invention. In one aspect of this embodiment, said cDNA encodes at least a portion 
of a human polypeptide. 

Other protocols 

Alternatively, other procedures may be used for obtaining homologous cDNAs. In one approach, 
30 cDNAs are prepared from mRNA and cloned into double stranded phagemids as follows. The cDNA 
library in the double stranded phagemids is then rendered single stranded by treatment with an 
endonuclease, such as the Gene II product of the phage Fl and an exonuclease (Chang et aL, 1993, which 
disclosure is hereby incorporated by reference in its entirety). A biotinylated oligonucleotide comprising 
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the sequence of a fragment of a known GENSET cDNA, genomic DNA or fragment thereof is hybridized 
to the single stranded phagemids. Preferably, the fragment comprises at least 10, 12, 15, 17, 18, 20, 23, 
25, or 28 consecutive nucleotides of a sequence selected from the group consisting of the sequences of 
SEQ ID Nos: 1-241 and sequences of clone inserts of the deposited clone pool. 
5 Hybrids between the biotinylated oligonucleotide and phagemids are isolated by incubating the 

hybrids with streptavidin coated paramagnetic beads and retrieving the beads with a magnet (Fry et al y 
1992, which disclosure is hereby incorporated by reference in its entirety). Thereafter, the resulting 
phagemids are released from the beads and converted into double stranded DNA using a primer specific 
for the GENSET cDNA or fragment used to design the biotinylated oligonucleotide. Alternatively, 
10 protocols such as the Gene Trapper kit (Gibco BRL), which disclosure is which disclosure is hereby 
incorporated by reference in its entirety, may be used. The resulting double stranded DNA is transformed 
into bacteria. Homologous cDNAs to the GENSET cDNA or fragment thereof sequence are identified by 
colony PCR or colony hybridization. 

As a chromosome marker 

15 Chromosomal localization of the cDNA of the present invention were determined using 

information from public and proprietary databases. Table VIII lists the putative chromosomal location of 
the polynucleotides of the present invention. Column one lists the sequence identification number with the 
corresponding chromosomal location listed in column two. Thus, the present invention also relates to 
methods and compositions using the chromosomal location of the polynucleotides of the invention to 

20 construct a human high resolution map or to identify a given chromosome in a sample using any techniques 
known to those skilled in the art including those disclosed below. 

GENSET polynucleotides may also be mapped to their chromosomal locations using any methods 
or techniques known to those skilled in the art including radiation hybrid (RH) mapping, PCR-based 
mapping and Fluorescence in situ hybridization (FISH) mapping described below. 

25 Radiation hybrid mapping 

Radiation hybrid (RH) mapping is a somatic cell genetic approach that can be used for high 
resolution mapping of the human genome. In this approach, cell lines containing one or more human 
chromosomes are lethally irradiated, breaking each chromosome into fragments whose size depends on 
the radiation dose. These fragments are rescued by fusion with cultured rodent cells, yielding subclones 

30 containing different fragments of the human genome. This technique is described by Benham et al. 
(1989) and Cox et al, (1990), which disclosures are hereby incorporated by reference in their entireties. 
The random and independent nature of the subclones permits efficient mapping of any human genome 
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marker. Human DNA isolated from a panel of 80-100 cell lines provides a mapping reagent for ordering 
GENSET cDNAs or genomic DNAs. In this approach, the frequency of breakage between markers is 
used to measure distance, allowing construction of fine resolution maps as has been done using 
conventional ESTs (Schuler et al, 1996), which disclosure is hereby incorporated by reference in its 
5 entirety. 

RH mapping has been used to generate a high-resolution whole genome radiation hybrid map of 
human chromosome 17q22-q25.3 across the genes for growth hormone (GH) and thymidine kinase (TK) 
(Foster et al., 1996), the region surrounding the Gorlin syndrome gene (Obermayr et al, 1996), 60 loci 
covering the entire short arm of chromosome 12 (Raeymaekers et al, 1995), the region of human 
10 chromosome 22 containing the neurofibromatosis type 2 locus (Frazer et al, 1992) and 13 loci on the 
long arm of chromosome 5 (Warrington et al, 1991), which disclosures are hereby incorporated by 
reference in their entireties. 

Mapping of cDNAs to Human Chromosomes using PCR techniques 

GENSET cDNAs and genomic DNAs may be assigned to human chromosomes using PCR based 

15 methodologies. In such approaches, oligonucleotide primer pairs are designed from the cDNA sequence 
to minimize the chance of amplifying through an intron. Preferably, the oligonucleotide primers are 1 8- 
23 bp in length and are designed for PCR amplification. The creation of PCR primers from known 
sequences is well known to those with skill in the art. For a review of PCR technology see Erlich (1992), 
which disclosure is hereby incorporated by reference in its entirety. 

20 The primers are used in polymerase chain reactions (PCR) to amplify templates from total human 

genomic DNA. PCR conditions are as follows: 60 ng of genomic DNA is used as a template for PCR 
with 80 ng of each oligonucleotide primer, 0.6 unit of Taq polymerase, and 1 uCu of a 32 P-labeled 
deoxycytidine triphosphate. The PCR is performed in a microplate thermocycler (Techne) under the 
following conditions: 30 cycles of 94 degree Celsius, 1.4 min; 55 degree Celsius, 2 min; and 72 degree 

25 Celsius, 2 min; with a final extension at 72 degree Celsius for 10 min. The amplified products are 
analyzed on a 6% polyacrylamide sequencing gel and visualized by autoradiography. If the length of the 
resulting PCR product is identical to the distance between the ends of the primer sequences in the cDNA 
from which the primers are derived, then the PCR reaction is repeated with DNA templates from two 
panels of human-rodent somatic cell hybrids, BIOS PCRable DNA (BIOS Corporation) and NIGMS 

30 Human-Rodent Somatic Cell Hybrid Mapping Panel Number 1 (NIGMS, Camden, NJ). 

PCR is used to screen a series of somatic cell hybrid cell lines containing defined sets of human 
chromosomes for the presence of a given cDNA or genomic DNA. DNA is isolated from the somatic 
hybrids and used as starting templates for PCR reactions using the primer pairs from the GENSET 
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cDNAs or genomic DNAs. Only those somatic cell hybrids with chromosomes containing the human 
gene corresponding to the GENSET cDNA or genomic DNA will yield an amplified fragment. The 
GENSET cDNAs or genomic DNAs are assigned to a chromosome by analysis of the segregation pattern 
of PCR products from the somatic hybrid DNA templates. The single human chromosome present in all 
5 cell hybrids that give rise to an amplified fragment is the chromosome containing that GENSET cDNA or 
genomic DNA. For a review of techniques and analysis of results from somatic cell gene mapping 
experiments, see Ledbetter et aL, (1990), which disclosure is hereby incorporated by reference in- its 
entirety. 

Mapping of cDNAs to Chromosomes Using Fluorescence in situ Hybridization 

10 Fluorescence in situ hybridization allows the GENSET cDNA or genomic DNA to be mapped to 

a particular location on a given chromosome. The chromosomes to be used for fluorescence in situ 
hybridization techniques may be obtained from a variety of sources including cell cultures, tissues, or 
whole blood. 

In a preferred embodiment, chromosomal localization of a GENSET cDNA or genomic DNA is 

15 obtained by FISH as described by Cherif et al (1990), which disclosure is hereby incorporated by 
reference in its entirety. Metaphase chromosomes are prepared from phytohemagglutinin (PHA)- 
stimulated blood cell donors. PHA-stimulated lymphocytes from healthy males are cultured for 72 h in 
RPMI-1640 medium. For synchronization, methotrexate (10 uM) is added for 17 h, followed by addition 
of 5-bromodeoxyuridine (5-BudR, 0.1 mM) for 6 h. Colcemid (1 ug/ml) is added for the last 15 min 

20 before harvesting the cells. Cells are collected, washed in RPMI, incubated with a hypotonic solution of 
KC1 (75 mM) at 37 degree Celsius for 15 min and fixed in three changes of methanol: acetic acid (3:1). 
The cell suspension is dropped onto a glass slide and air dried. The GENSET cDNA or genomic DNA is 
labeled with biotin-16 dUTP by nick translation according to the manufacturer's instructions (Bethesda 
Research Laboratories, Bethesda, MD), purified using a Sephadex G-50 column (Pharmacia, Upssala, 

25 Sweden) and precipitated. Just prior to hybridization, the DNA pellet is dissolved in hybridization buffer 
(50% formamide, 2 X SSC, 10% dextran sulfate, 1 mg/ml sonicated salmon sperm DNA, pH 7) and the 
probe is denatured at 70 degree Celsius for 5-10 min. 

Slides kept at -20 degree Celsius are treated for 1 h at 37 degree Celsius with RNase A (100 
ug/ml), rinsed three times in 2 X SSC and dehydrated in an ethanol series. Chromosome preparations are 

30 denatured in 70% formamide, 2 X SSC for 2 min at 70 degree Celsius, then dehydrated at 4 degree 
Celsius. The slides are treated with proteinase K (10 ug/100 ml in 20 mM Tris-HCl, 2 mM CaCl 2 ) at 37 
degree Celsius for 8 min and dehydrated. The hybridization mixture containing the probe is placed on the 
slide, covered with a coverslip, sealed with rubber cement and incubated overnight in a humid chamber at 
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37 degree Celsius. After hybridization and post-hybridization washes, the biotinylated probe is detected 
by avidin-FITC and amplified with additional layers of biotinylated goat anti-avidin and avidin-FITC. 
For chromosomal localization, fluorescent R-bands are obtained as previously described (Cherif et ai, 
1990). The slides are observed under a LEICA fluorescence microscope (DMRXA). Chromosomes are 
5 counterstained with propidium iodide and the fluorescent signal of the probe appears as two symmetrical 
yellow-green spots on both chromatids of the fluorescent R-band chromosome (red). Thus, a particular 
GENSET cDNA or genomic DNA may be localized to a particular cytogenetic R-band on a given 
chromosome. 

Use of cDNAs to Construct or Expand Chromosome Maps 

10 Once the GENSET cDNAs or genomic DNAs have been assigned to particular chromosomes 

using any technique known to those skilled in the art those skilled in the art, particularly those described 
herein, they may be utilized to construct a high resolution map of the chromosomes on which they are 
located or to identify the chromosomes in a sample. 

Chromosome mapping involves assigning a given unique sequence to a particular chromosome as 

15 described above. Once the unique sequence has been mapped to a given chromosome, it is ordered 
relative to other unique sequences located on the same chromosome. One approach to chromosome 
mapping utilizes a series of yeast artificial chromosomes (YACs) bearing several thousand long inserts 
derived from the chromosomes of the organism from which the GENSET cDNAs or genomic DNAs are 
obtained. This approach is described in Nagaraja et al. (1997), which disclosure is hereby incorporated 

20 by reference in its entirety. Briefly, in this approach each chromosome is broken into overlapping pieces 
which are inserted into the YAC vector. The YAC inserts are screened using PCR or other methods to 
determine whether they include the GENSET cDNA or genomic DNA whose position is to be 
determined. Once an insert has been found which includes the GENSET cDNA or genomic DNA, the 
insert can be analyzed by PCR or other methods to determine whether the insert also contains other 

25 sequences known to be on the chromosome or in the region from which the GENSET cDNA or genomic 
DNA was derived. This process can be repeated for each insert in the YAC library to determine the 
location of each of the GENSET cDNA or genomic DNA relative to one another and to other known 
chromosomal markers. In this way, a high resolution map of the distribution of numerous unique markers 
along each of the organisms chromosomes may be obtained. 

30 Identification of genes associated with hereditary diseases or drug response 

This example illustrates an approach useful for the association of GENSET cDNAs or genomic 
DNAs with particular phenotypic characteristics. In this example, a particular GENSET cDNA or 
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genomic DNA is used as a test probe to associate that GENSET cDNA or genomic DNA with a particular 
phenotypic characteristic. 

GENSET cDNAs or genomic DNAs are mapped to a particular location on a human chromosome 
using techniques such as those described herein or other techniques known in the art. A search of 
5 Mendelian Inheritance in Man (V. McKusick, Mendelian Inheritance in Man (available on line through 
Johns Hopkins University Welch Medical Library) reveals the region of the human chromosome which 
contains the GENSET cDNA or genomic DNA to be a very gene rich region containing several known 
genes and several diseases or phenotypes for which genes have not been identified. The gene 
corresponding to this GENSET cDNA or genomic DNA thus becomes an immediate candidate for each 

10 of these genetic diseases. 

Cells from patients with these diseases or phenotypes are isolated and expanded in culture. PCR 
primers from the GENSET cDNA or genomic DNA are used to screen genomic DNA, mRNA or cDNA 
obtained from the patients. GENSET cDNAs or genomic DNAs that are not amplified in the patients can 
be positively associated with a particular disease by further analysis. Alternatively, the PCR analysis may 

15 yield fragments of different lengths when the samples are derived from an individual having the 
phenotype associated with the disease than when the sample is derived from a healthy individual, 
indicating that the gene containing the cDNA may be responsible for the genetic disease. 

Uses of polynucleotides in recombinant vectors 

The present invention also relates to recombinant vectors, which include the isolated 
20 polynucleotides of the present invention, or fragments thereof and to host cells recombinant for a 
polynucleotide of the invention, such as the above vectors, as well as to methods of making such vectors 
and host cells and for using them for production of GENSET polypeptides by recombinant techniques. 

Recombinant Vectors 

The term "vector" is used herein to designate either a circular or a linear DNA or RNA molecule, 
25 which is either double-stranded or single-stranded, and which comprise at least one polynucleotide of 
interest that is sought to be transferred in a cell host or in a unicellular or multicellular host organism. 
The present invention encompasses a family of recombinant vectors that comprise a regulatory 
polynucleotide and/or a coding polynucleotide derived from either the GENSET genomic sequence or the 
cDNA sequence. Generally, a recombinant vector of the invention may comprise any of the 
30 polynucleotides described herein, including regulatory sequences, coding sequences and polynucleotide 
constructs, as well as any GENSET primer or probe as defined herein. 
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In a first preferred embodiment, a recombinant vector of the invention is used to amplify the 
inserted polynucleotide derived from a GENSET genomic sequence or a GENSET cDNA, for example 
any cDNA selected from the group consisting of sequences of SEQ ID Nos: 1-241, sequences of clone 
inserts of the deposited clone pool, variants and fragments thereof in a suitable cell host, this 
5 polynucleotide being amplified at every time that the recombinant vector replicates. 

A second preferred embodiment of the recombinant vectors according to the invention comprises 
expression vectors comprising either a regulatory polynucleotide or a coding nucleic acid of the 
invention, or both. Within certain embodiments, expression vectors are employed to express a GENSET 
polypeptide which can be then purified and, for example be used in ligand screening assays or as an 

10 immunogen in order to raise specific antibodies directed against the GENSET protein. In other 
embodiments, the expression vectors are used for constructing transgenic animals and also for gene 
therapy. Expression requires that appropriate signals are provided in the vectors, said signals including 
various regulatory elements, such as enhancers/promoters from both viral and mammalian sources that 
drive expression of the genes of interest in host cells. Dominant drug selection markers for establishing 

15 permanent, stable cell clones expressing the products are generally included in the expression vectors of 
the invention, as they are elements that link expression of the drug selection markers to expression of the 
polypeptide. 

More particularly, the present invention relates to expression vectors which include nucleic acids 
encoding a GENSET protein, preferably a GENSET protein with an amino acid sequence selected from 

20 the group consisting of sequences of SEQ ID Nos: 242-482, mature polypeptides included in sequences of 
SEQ ID Nos: 242-272 and 274-384, and sequences of full-length or mature polypeptides encoded by the 
clone inserts of the deposited clone pool, as well as variants and fragments thereof. The polynucleotides 
of the present invention may be used to express an encoded protein in a host organism to produce a 
beneficial effect. In such procedures, the encoded protein may be transiently expressed in the host 

25 organism or stably expressed in the host organism. The encoded protein may have any of the activities 
described herein. The encoded protein may be a protein which the host organism lacks or, alternatively, 
the encoded protein may augment the existing levels of the protein in the host organism. 

Some of the elements which can be found in the vectors of the present invention are described in 
further detail in the following sections. 

30 General features of the expression vectors of the invention 

A recombinant vector according to the invention comprises, but is not limited to, a YAC (Yeast 
Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid, a cosmid, a 
plasmid or even a linear DNA molecule which may comprise a chromosomal, non-chromosomal, semi- 
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synthetic and synthetic DNA. Such a recombinant vector can comprise a transcriptional unit comprising 
an assembly of: 

(1) a genetic element or elements having a regulatory role in gene expression, for example 
promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 10 to 300 bp in 

5 length that act on the promoter to increase the transcription. 

(2) a structural or coding sequence which is transcribed into mRNA and eventually translated into 
a polypeptide, said structural or coding sequence being operably linked to the regulatory elements 
described in (1); and 

(3) appropriate transcription initiation and termination sequences. Structural units intended for 
10 use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular 

secretion of translated protein by a host cell. Alternatively, when a recombinant protein is expressed 
without a leader or transport sequence, it may include a N-terminal residue. This residue may or may not 
be subsequently cleaved from the expressed recombinant protein to provide a final product. 

Generally, recombinant expression vectors will include origins of replication, selectable markers 
15 permitting transformation of the host cell, and a promoter derived from a highly expressed gene to direct . 
transcription of a downstream structural sequence. The heterologous structural sequence is assembled in 
appropriate phase with translation initiation and termination sequences, and preferably a leader sequence 
capable of directing secretion of the translated protein into the periplasmic space or the extracellular 
medium. In a specific embodiment wherein the vector is adapted for transfecting and expressing desired 
20 sequences in mammalian host cells, preferred vectors will comprise an origin of replication in the desired 
host, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation 
signal, splice donor and acceptor sites, transcriptional termination sequences, and 5 '-flanking non- 
transcribed sequences. DNA sequences derived from the SV40 viral genome, for example SV40 origin, 
early promoter, enhancer, splice and polyadenylation signals may be used to provide the required non- 
25 transcribed genetic elements. 

The in vivo expression of a GENSET polypeptide of the present invention may be useful in order 
to correct a genetic defect related to the expression of the native gene in a host organism or to the 
production of a biologically inactive GENSET protein. Consequently, the present invention also 
comprises recombinant expression vectors mainly designed for the in vivo production of a GENSET 
30 polypeptide of the present invention by the introduction of the appropriate genetic material in the 
organism or the patient to be treated. This genetic material may be introduced in vitro in a cell that has 
been previously extracted from the organism, the modified cell being subsequently reintroduced in the 
said organism, directly in vivo into the appropriate tissue. 
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Regulatory Elements 

The suitable promoter regions used in the expression vectors according to the present invention 
are chosen taking into account the cell host in which the heterologous gene has to be expressed. The 
particular promoter employed to control the expression of a nucleic acid sequence of interest is not 
5 believed to be important, so long as it is capable of directing the expression of the nucleic acid in the 
targeted cell. Thus, where a human cell is targeted, it is preferable to position the nucleic acid coding 
region adjacent to and under the control of a promoter that is capable of being expressed in a human cell, 
such as, for example, a human or a viral promoter. 

A suitable promoter may be heterologous with respect to the nucleic acid for which it controls the 
10 expression or alternatively can be endogenous to the native polynucleotide containing the coding 
sequence to be expressed. Additionally, the promoter is generally heterologous with respect to the 
recombinant vector sequences within which the construct promoter/coding sequence has been inserted. 

Promoter regions can be selected from any desired gene using, for example, CAT 
(chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 
15 Preferred bacterial promoters are the LacI, LacZ, the T3 or T7 bacteriophage RNA polymerase 

promoters, the gpt, lambda PR, PL and tip promoters (EP 0036776), the polyhedrin promoter, or the pi 0 
protein promoter from baculovirus (Kit Novagen), (Smith et al, 1983; O'Reilly et al, 1992), which 
disclosures are hereby incorporated by reference in their entireties, the lambda PR promoter or also the trc 
promoter. 

20 Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late 

SV40, LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and 
promoter is well within the level of ordinary skill in the art. The choice of a promoter is well within the 
ability of a person skilled in the field of genetic engineering. For example, one may refer to the book of 
Sambrook et al., (1989) or also to the procedures described by Fuller et aL, (1996), which disclosures are 

25 hereby incorporated by reference in their entireties. 

Other regulatory elements 

Where a cDNA insert is employed, one will typically desire to include a polyadenylation signal to 
effect proper polyadenylation of the gene transcript. The nature of the polyadenylation signal is not 
believed to be crucial to the successful practice of the invention, and any such sequence may be employed 
30 such as human growth hormone and SV40 polyadenylation signals. Also contemplated as an element of 
the expression cassette is a terminator. These elements can serve to enhance message levels and to 
minimize read through from the cassette into other sequences. 
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Selectable Markers 

Selectable markers confer an identifiable change to the cell permitting easy identification of cells 
containing the expression construct. The selectable marker genes for selection of transformed host cells 
are preferably dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, TRP1 for S. 
5 cerevisiae or tetracycline, rifampicin or ampicillin resistance in E. Coli, or levan saccharase for 
mycobacteria, this latter marker being a negative selection marker. 

Preferred Vectors 
Bacterial vectors 

As a representative but non-limiting example, useful expression vectors for bacterial use can 
10 comprise a selectable marker and a bacterial origin of replication derived from commercially available 
plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial vectors include, for 
example, pKK223-3 (Pharmacia, Uppsala, Sweden), and pGEMl (Promega Biotec, Madison, WI, USA). 

Large numbers of other suitable vectors are known to those of skill in the art, and commercially 
available, such as the following bacterial vectors: pQE70, pQE60, pQE-9 (Qiagen), pbs, pDIO, 
15 phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A (Stratagene); 
ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); pWLNEO, pSV2CAT, pOG44, pXTl, 
pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); pQE-30 (QIAexpress). 

Bacteriophage vectors 

The PI bacteriophage vector may contain large inserts ranging from about 80 to about. 100 kb. 

20 The construction of PI bacteriophage vectors such as pl58 or pl58/neo8 are notably described by 
Sternberg (1992, 1994), which disclosure is hereby incorporated by reference in its entirety. 
Recombinant PI clones comprising GENSET nucleotide sequences may be designed for inserting large 
polynucleotides of more than 40 kb (See Linton et aL, 1993), which disclosure is hereby incorporated by 
reference in its entirety. To generate PI DNA for transgenic experiments, a preferred protocol is the 

25 protocol described by McCormick et al. (1994), which disclosure is hereby incorporated by reference in 
its entirety. Briefly, E. coli (preferably strain NS3529) harboring the PI plasmid are grown overnight in a 
suitable broth medium containing 25 jig/ml of kanamycin. The PI DNA is prepared from the E. coli by 
alkaline lysis using the Qiagen Plasmid Maxi kit (Qiagen, Chatsworth, CA, USA), according to the 
manufacturer's instructions. The PI DNA is purified from the bacterial lysate on two Qiagen-tip 500 

30 columns, using the washing and elution buffers contained in the kit. A phenol/chloroform extraction is 
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then performed before precipitating the DNA with 70% ethanol. After solubilizing the DNA in TE (10 
mM Tris-HCl, pH 7.4, 1 mM EDTA), the concentration of the DNA is assessed by spectrophotometry. 

When the goal is to express a PI clone comprising GENSET nucleotide sequences in a transgenic 
animal, typically in transgenic mice, it is desirable to remove vector sequences from the PI DNA 
5 fragment, for example by cleaving the PI DNA at rare-cutting sites within the PI polylinker (SfiU Notl or 
Sail). The PI insert is then purified from vector sequences on a pulsed-field agarose gel, using methods 
similar to those originally reported for the isolation of DNA from YACs (See e. g., Schedl et al, 1993a; 
Peterson et aL, 1993), which disclosures are hereby incorporated by reference in their entireties. At this 
stage, the resulting purified insert DNA can be concentrated, if necessary, on a Millipore Ultrafree-MC 
10 Filter Unit (Millipore, Bedford, MA, USA - 30,000 molecular weight limit) and then dialyzed against 
microinjection buffer (10 mM Tris-HCl, pH 7.4; 250 juM EDTA) containing 100 mM NaCl, 30 \M 
spermine, 70 \iM spermidine on a microdyalisis membrane (type VS, 0.025 nM from Millipore). The 
intactness of the purified PI DNA insert is assessed by electrophoresis on 1% agarose (Sea Kem GTG; 
FMC Bio-products) pulse-field gel and staining with ethidium bromide. 

1 5 Viral vectors 

In one specific embodiment, the vector is derived from an adenovirus. Preferred adenovirus 
vectors according to the invention are those described by Feldman and Steg (1996), or Ohno et ah, 
(1994), which disclosures are hereby incorporated by reference in their entireties. Another preferred 
recombinant adenovirus according to this specific embodiment of the present invention is the human 

20 adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal origin (French patent application No. 
FR-93.05954), which disclosure is hereby incorporated by reference in its entirety. 

Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 
recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides in vivo , 
particularly to mammals, including humans. These vectors provide efficient delivery of genes into cells, 

25 and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host. 
Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or in vitro gene 
delivery vehicles of the present invention include retroviruses selected from the group consisting of Mink- 
Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus and Rous Sarcoma virus. 
Particularly preferred Murine Leukemia Viruses include the 4070A and the 1504A viruses, Abelson 

30 (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No VR-590), Rauscher (ATCC No VR- 
998) and Moloney Murine Leukemia Virus (ATCC No VR-190; PCT Application No WO 94/24298). 
Particularly preferred Rous Sarcoma Viruses include Bryan high titer (ATCC Nos VR-334, VR-657, VR- 
726, VR-659 and VR-728). Other preferred retroviral vectors are those described in Roth et al. (1996), 
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PCT Application No WO 93/25234, PCT Application No WO 94/ 06920, Roux et aL, (1989), Julan et al, 
(1992), and Neda et al, (1991), which disclosures are hereby incorporated by reference in their entireties. 

Yet another viral vector system that is contemplated by the invention comprises the adeno- 
associated virus (AAV). The adeno-associated virus is a naturally occurring defective virus that requires 
5 another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient replication and a 
productive life cycle (Muzyczka et al., 1992), which disclosure is hereby incorporated by reference in its 
entirety. It is also one of the few viruses that may integrate its DNA into non-dividing cells, and exhibits 
a high frequency of stable integration (Flotte et al. 1992; Samulski et al., 1989; McLaughlin et al, 1989), 
which disclosures are hereby incorporated by reference in their entireties. One advantageous feature of 
10 AAV derives from its reduced efficacy for transducing primary cells relative to transformed cells. 

BAC vectors 

The bacterial artificial chromosome (BAC) cloning system (Shizuya et al., 1992), which 
disclosure is hereby incorporated by reference in its entirety, has been developed to stably maintain large 
fragments of genomic DNA (100-300 kb) in E. coli. A preferred BAC vector comprises a pBeloBACl 1 

15 vector that has been described by Kim et aL (1996), which disclosure is hereby incorporated by reference 
in its entirety. BAC libraries are prepared with this vector using size-selected genomic DNA that has 
been partially digested using enzymes that permit ligation into either the Bam HI or Hindlll sites in the 
vector. Flanking these cloning sites are T7 and SP6 RNA polymerase transcription initiation sites that 
can be used to generate end probes by either RNA transcription or PCR methods. After the construction 

20 of a BAC library in E. coli, BAC DNA is purified from the host cell as a supercoiled circle. Converting 
these circular molecules into a linear form precedes both size determination and introduction of the BACs 
into recipient cells. The cloning site is flanked by two Not I sites, permitting cloned segments to be 
excised from the vector by Not I digestion. Alternatively, the DNA insert contained in the pBeloBACl 1 
vector may be linearized by treatment of the BAC vector with the commercially available enzyme lambda 

25 terminase that leads to the cleavage at the unique cosN site, but this cleavage method results in a full 
length BAC clone containing both the insert DNA and the BAC sequences. 

Baculovirus: 

Another specific suitable host vector system is the pVLl 392/1 393 baculovirus transfer vector 
(Pharmingen) that is used to transfect the SF9 cell line (ATCC No. CRL 1711) which is derived from 
30 Spodoptera frugiperda. Other suitable vectors for the expression of the GENSET polypeptide of the 
present invention in a baculovirus expression system include those described by Chai et al., (1993), 
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Vlasak et aL, (1983), and Lenhard et aL, (1996), which disclosures are hereby incorporated by reference 
in their entireties. 

Delivery Of The Recombinant Vectors: 

To effect expression of the polynucleotides and polynucleotide constructs of the invention, these 
5 constructs must be delivered into a cell. This delivery may be accomplished in vitro, as in laboratory 
procedures for transforming cell lines, or in vivo or ex vivo, as in the treatment of certain diseases states. 
One mechanism is viral infection where the expression construct is encapsulated in an infectious viral 
particle. 

Several non-viral methods for the transfer of polynucleotides into cultured mammalian cells are 
10 also contemplated by the present invention, and include, without being limited to, calcium phosphate 
precipitation (Graham et aL, 1973; Chen et aL, 1987); DEAE-dextran (Gopal, 1985); electroporation 
(Tur-Kaspa et aL, 1986; Potter et aL, 1984); direct microinjection (Harland et aL, 1985); DNA-loaded 
liposomes (Nicolau et aL, 1982; Fraley et aL, 1979); and receptor-mediated transfection. (Wu and Wu, 
1987, 1988), which disclosures are hereby incorporated by reference in their entireties. Some of these 
15 techniques may be successfully adapted for in vivo or ex vivo use. 

Once the expression polynucleotide has been delivered into the cell, it may be stably integrated 
into the genome of the recipient cell. This integration may be in the cognate location and orientation via 
homologous recombination (gene replacement) or it may be integrated in a random, non-specific location 
(gene augmentation). In yet further embodiments, the nucleic acid may be stably maintained in the cell as 
20 a separate, episomal segment of DNA. Such nucleic acid segments or "episomes" encode sequences 
sufficient to permit maintenance and replication independent of or in synchronization with the host cell 
cycle. 

One specific embodiment for a method for delivering a protein or peptide to the interior of a cell 
of a vertebrate in vivo comprises the step of introducing a preparation comprising a physiologically 

25 acceptable carrier and a naked polynucleotide operatively coding for the polypeptide of interest into the 
interstitial space of a tissue comprising the cell, whereby the naked polynucleotide is taken up into the 
interior of the cell and has a physiological effect. This is particularly applicable for transfer in vitro but it 
may be applied to in vivo as well. 

Compositions for use in vitro and in vivo comprising a "naked" polynucleotide are described in 

30 PCT application No. WO 90/11092 (Vical Inc.) and also in PCT application No. WO 95/11307 (Institut 
Pasteur, INSERM, Universite d'Ottawa) as well as in the articles of Tacson et aL (1996) and of Huygen et 
aL, (1996), which disclosures are hereby incorporated by reference in their entireties. 
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In still another embodiment of the invention, the transfer of a naked polynucleotide of the 
invention, including a polynucleotide construct of the invention, into cells may be proceeded with a 
particle bombardment (biolistic), said particles being DNA-coated microprojectiles accelerated to a high 
velocity allowing them to pierce cell membranes and enter cells without killing them, such as described 
5 by Klein et al., (1987), which disclosure is hereby incorporated by reference in its entirety. 

In a further embodiment, the polynucleotide of the invention may be entrapped in a liposome 
(Ghosh and Bacchawat, 1991; Wong et al, 1980; Nicolau et al, 1987), which disclosures are hereby 
incorporated by reference in their entireties. 

In a specific embodiment, the invention provides a composition for the in vivo production of the 
10 GENSET protein or polypeptide described herein. It comprises a naked polynucleotide operatively 
coding for this polypeptide, in solution in a physiologically acceptable carrier, and suitable for 
introduction into a tissue to cause cells of the tissue to express the said protein or polypeptide. 

The amount of vector to be injected to the desired host organism varies according to the site of 
injection. As an indicative dose, it will be injected between 0,1 and 100 ^g of the vector in an animal 
15 body, preferably a mammal body, for example a mouse body. 

In another embodiment of the vector according to the invention, it may be introduced in vitro in a 
host cell, preferably in a host cell previously harvested from the animal to be treated and more preferably 
a somatic cell such as a muscle cell. In a subsequent step, the cell that has been transformed with the 
vector coding for the desired GENSET polypeptide or the desired fragment thereof is reintroduced into 
20 the animal body in order to deliver the recombinant protein within the body either locally or systemically. 

Secretion vectors 

Some of the GENSET cDNAs or genomic DNAs of the invention may also be used to construct 
secretion vectors capable of directing the secretion of the proteins encoded by genes inserted in the 
vectors. Such secretion vectors may facilitate the purification or enrichment of the proteins encoded by 
25 genes inserted therein by reducing the number of background proteins from which the desired protein 
must be purified or enriched. Exemplary secretion vectors are described below. 

The secretion vectors of the present invention include a promoter capable of directing gene 
expression in the host cell, tissue, or organism of interest. Such promoters include the Rous Sarcoma 
Virus promoter, the SV40 promoter, the human cytomegalovirus promoter, and other promoters familiar 
30 to those skilled in the art. 

A signal sequence from a polynucleotide of the invention, preferably a signal sequences selected 
from the group of signal sequences of SEQ ID Nos: 1-31 and 33-143 and signal sequences of clone 
inserts of the deposited clone pool is operably linked to the promoter such that the mRNA transcribed 
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from the promoter will direct the translation of the signal peptide. The host cell, tissue, or organism may 
be any cell, tissue, or organism which recognizes the signal peptide encoded by the signal sequence in the 
GENSET cDNA or genomic DNA. Suitable hosts include mammalian cells, tissues or organisms, avian 
cells, tissues, or organisms, insect cells, tissues or organisms, or yeast. 
5 In- addition, the secretion vector contains cloning sites for inserting genes encoding the proteins 

which are to be secreted. The cloning sites facilitate the cloning of the insert gene in frame with the 
signal sequence such that a fusion protein in which the signal peptide is fused to the protein encoded by 
the inserted gene is expressed from the mRNA transcribed from the promoter. The signal peptide directs 
the extracellular secretion of the fusion protein. 

10 The secretion vector may be DNA or RNA and may integrate into the chromosome of the host, be 

stably maintained as an extrachromosomal replicon in the host, be an artificial chromosome, or be 
transiently present in the host. Preferably, the secretion vector is maintained in multiple copies in each 
host cell. As used herein, multiple copies means at least 2,5, 10, 20, 25, 50 or more than 50 copies per 
cell. In some embodiments, the multiple copies are maintained extrachromosomally. In other 

15 embodiments, the multiple copies result from amplification of a chromosomal sequence. 

Many nucleic acid backbones suitable for use as secretion vectors are known to those skilled in 
the art, including retroviral vectors, SV40 vectors, Bovine Papilloma Virus vectors, yeast integrating 
plasmids, yeast episomal plasmids, yeast artificial chromosomes, human artificial chromosomes, P 
element vectors, baculovirus vectors, or bacterial plasmids capable of being transiently introduced into 

20 the host. 

The secretion vector may also contain a polyA signal such that the polyA signal is located 
downstream of the gene inserted into the secretion vector. 

After the gene encoding the protein for which secretion is desired is inserted into the secretion 
vector, the secretion vector is introduced into the host cell, tissue, or organism using calcium phosphate 
25 precipitation, DEAE-Dextran, electroporation, liposome-mediated transfection, viral particles or as naked 
DNA. The protein encoded by the inserted gene is then purified or enriched from the supernatant using 
conventional techniques such as ammonium sulfate precipitation, immunoprecipitation, 
immunochromatography, size exclusion chromatography, ion exchange chromatography, and hplc. 
Alternatively, the secreted protein may be in a sufficiently enriched or pure state in the supernatant or 
30 growth media of the host to permit it to be used for its intended purpose without further enrichment. 

The signal sequences may also be inserted into vectors designed for gene therapy. In such 
vectors, the signal sequence is operably linked to a promoter such that mRNA transcribed from the 
promoter encodes the signal peptide. A cloning site is located downstream of the signal sequence such 
that a gene encoding a protein whose secretion is desired may readily be inserted into the vector and fused 
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to the signal sequence. The vector is introduced into an appropriate host cell. The protein expressed from 
the promoter is secreted extracellularly, thereby producing a therapeutic effect. 

Cell Hosts 

Another object of the invention comprises a host cell that has been transformed or transfected 
5 with one of the polynucleotides described herein, and in particular a polynucleotide either comprising a 
GENSET regulatory polynucleotide or the polynucleotide coding for a GENSET polypeptide. Also 
included are host cells that are transformed (prokaryotic cells) or that are transfected (eukaryotic cells) 
with a recombinant vector such as one of those described above. However, the cell hosts of the present 
invention can comprise any of the polynucleotides of the present invention. In a preferred embodiment, 
10 host cells contain a polynucleotide sequence comprising a sequence selected from the group consisting of 
sequences of SEQ ID Nos: 1-241, sequences of clone inserts of the deposited clone pool, variants and 
fragments thereof. Preferred host cells used as recipients for the expression vectors of the invention are 
the following: 

a) Prokaryotic host cells: Escherichia coli strains (I.E.DH5-a strain), Bacillus subtilis, Salmonella 
15 typhimurium, and strains from species like Pseudomonas, Streptomyces and Staphylococcus. 

b) Eukaryotic host cells: HeLa cells (ATCC No.CCL2; No.CCL2.1; No.CCL2.2), Cv 1 cells 
(ATCC No.CCL70), COS cells (ATCC No.CRL1650; No.CRL1651), Sf-9 cells (ATCC No.CRL1711), 
C127 cells (ATCC No. CRL-1804), 3T3 (ATCC No. CRL-6361), CHO (ATCC No. CCL-61), human 
kidney 293. (ATCC No. 45504; No. CRL-1573) and BHK (ECACC No. 84100501; No. 841 1 1301). 

20 c) Other mammalian host cells. 

The present invention also encompasses primary, secondary, and immortalized homologously 
recombinant host cells of vertebrate origin, preferably mammalian origin and particularly human origin, 
that have been engineered to: a) insert exogenous (heterologous) polynucleotides into the endogenous 
chromosomal DNA of a targeted gene, b) delete endogenous chromosomal DNA, and/or c) replace 

25 endogenous chromosomal DNA with exogenous polynucleotides. Insertions, deletions, and/or 
replacements of polynucleotide sequences may be to the coding sequences of the targeted gene and/or to 
regulatory regions, such as promoter and enhancer sequences, operably associated with the targeted gene. 

In addition to encompassing host cells containing the vector constructs discussed herein, the 
invention also encompasses primary, secondary, and immortalized host cells of vertebrate origin, 

30 particularly mammalian origin, that have been engineered to delete or replace endogenous genetic 
material (e.g., coding sequence), and/or to include genetic material (e.g., heterologous polynucleotide 
sequences) that is operably associated with the polynucleotides of the invention, and which activates, 
alters, and/or amplifies endogenous polynucleotides. For example, techniques known in the art may be 
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used to operably associate heterologous control regions (e.g., promoter and/or enhancer) and endogenous 
polynucleotide sequences via homologous recombination, see, e.g., U.S. Patent No. 5,641,670, issued 
June 24, 1997; International Publication No. WO 96/29411, published September 26, 1996; International 
Publication No. WO 94/12650, published August 4, 1994; Roller et aL, (1989); and Zijlstra et al (1989) 
5 (The disclosures of each of which are incorporated by reference in their entireties). 

The present invention further relates to a method of making a homologously recombinant host 
cell in vitro or in vivo, wherein the expression of a targeted gene not normally expressed in the cell is 
altered. Preferably the alteration causes expression of the targeted gene under normal growth conditions 
or under conditions suitable for producing the polypeptide encoded by the targeted gene. The method 
10 comprises the steps of: (a) transfecting the cell in vitro or in vivo with a polynucleotide construct, said 
polynucleotide construct comprising; (i) a targeting sequence; (ii) a regulatory sequence and/or a coding 
sequence; and (iii) an unpaired splice donor site, if necessary, thereby producing a transfected cell; and 
(b) maintaining the transfected cell in vitro or in vivo under conditions appropriate for homologous 
recombination. 

15 The present invention further relates to a method of altering the expression of a targeted gene in a 

cell in vitro or in vivo wherein the gene is not normally expressed in the cell, comprising the steps of: (a) 
transfecting the cell in vitro or in vivo with a polynucleotide construct, said polynucleotide construct 
comprising: (i) a targeting sequence; (ii) a regulatory sequence and/or a coding sequence; and (iii) an 
unpaired splice donor site, if necessary, thereby producing a transfected cell; and (b) maintaining the 

20 transfected cell in vitro or in vivo under conditions appropriate for homologous recombination, thereby 
producing a homologously recombinant cell; and (c) maintaining the homologously recombinant cell in 
vitro or in vivo under conditions appropriate for expression of the gene. 

The present invention further relates to a method of making a polypeptide of the present invention 
by altering the expression of a targeted endogenous gene in a cell in vitro or in vivo wherein the gene is 

25 not normally expressed in the cell, comprising the steps of: a) transfecting the cell in vitro with a 
polynucleotide construct, said polynucleotide construct comprising: (i) a targeting sequence; (ii) a 
regulatory sequence and/or a coding sequence; and (iii) an unpaired splice donor site, if necessary, 
thereby producing a transfected cell; (b) maintaining the transfected cell in vitro or in vivo under 
conditions appropriate for homologous recombination, thereby producing a homologously recombinant 

30 cell; and c) maintaining the homologously recombinant cell in vitro or in vivo under conditions 
appropriate for expression of the gene thereby making the polypeptide. 

The present invention further relates to a polynucleotide construct which alters the expression of a 
targeted gene in a cell type in which the gene is not normally expressed. This occurs when the 
polynucleotide construct is inserted into the chromosomal DNA of the target cell, wherein said 
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polynucleotide construct comprises: a) a targeting sequence; b) a regulatory sequence and/or coding 
sequence; and c) an unpaired splice-donor site, if necessary. Further included are a polynucleotide 
construct, as described above, wherein said polynucleotide construct further comprises a polynucleotide 
which encodes a polypeptide and is in-frame with the targeted endogenous gene after homologous 
5 recombination with chromosomal DNA. 

The compositions may be produced, and methods performed, by techniques known in the art, 
such as those described in U.S. Patent Nos: 6,054,288; 6,048,729; 6,048,724; 6,048,524; 5,994,127; 
5,968,502; 5,965,125; 5,869,239; 5,817,789; 5,783,385; 5,733,761; 5,641,670; 5,580,734 ; International 
Publication Nos:W096/2941 1, WO 94/12650; and scientific articles described by Roller et al., (1994). 

10 (The disclosures of each of which are incorporated by reference in their entireties). 

The GENSET gene expression in mammalian cells, preferably human cells, may be rendered 
defective, or alternatively may be altered by replacing the endogenous GENSET gene in the genome of an 
animal cell by a GENSET polynucleotide according to the invention. These genetic alterations may be 
generated by homologous recombination using previously described specific polynucleotide constructs. 

15 Mammal zygotes, such as murine zygotes may be used as cell hosts. For example, murine 

zygotes may undergo microinjection with a purified DNA molecule of interest, for example a purified 
. DNA molecule that has previously been adjusted to a concentration ranging from 1 ng/ml -for BAC 
inserts- to 3 ng/|ul -for PI bacteriophage inserts- in 10 mM Tris-HCl, pH 7.4, 250 nM EDTA containing 
100 mM NaCl, 30 (iM spermine, and70 |iM spermidine. When the DNA to be microinjected has a large 

20 size, polyamines and high salt concentrations can be used in order to avoid mechanical breakage of this 
DNA, as described by Schedl et al (1993b), which disclosure is hereby incorporated by reference in its 
entirety. 

Any one of the polynucleotides of the invention, including the Polynucleotide constructs 
described herein, may be introduced in an embryonic stem (ES) cell line, preferably a mouse ES cell line. 

25 ES cell lines are derived from pluripotent, uncommitted cells of the inner cell mass of pre-implantation 
blastocysts. Preferred ES cell lines are the following: ES-E14TG2a (ATCC No.CRL-1821), ES-D3 
(ATCC No.CRL1934 and No. CRL-11632), YS001 (ATCC No. CRL-11776), 36.5 (ATCC No. CRL- 
11116). ES cells are maintained in an uncommitted state by culture in the presence of growth-inhibited 
feeder cells which provide the appropriate signals to preserve this embryonic phenotype and serve as a 

30 matrix for ES cell adherence. Preferred feeder cells are primary embryonic fibroblasts that are established 
from tissue of day 13- day 14 embryos of virtually any mouse strain, that are maintained in culture, such 
as described by Abbondanzo et al. (1993) and are growth-inhibited by irradiation, such as described by 
Robertson (1987), or by the presence of an inhibitory concentration of LIF, such as described by Pease 
and Williams (1990), which disclosures are hereby incorporated by reference in their entireties. 
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The constructs in the host cells can be used in a conventional manner to produce the gene product 
encoded by the recombinant sequence. 

Following transformation of a suitable host and growth of the host to an appropriate cell density, 
the selected promoter is induced by appropriate means, such as temperature shift or chemical induction, 
5 and cells are cultivated for an additional period. Cells are typically harvested by centrifugation, disrupted 
by physical or chemical means, and the resulting crude extract retained for further purification. Microbial 
cells employed in the expression of proteins can be disrupted by any convenient method, including freeze- 
thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Such methods are well 
known by the skilled artisan. 

10 Transgenic Animals 

The terms " transgenic animals " or " host animals " are used herein to designate animals that have 
their genome genetically and artificially manipulated so as to include one of the nucleic acids according to 
the invention. Preferred animals are non-human mammals and include those belonging to a genus 
selected from Mus (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. rabbits) which have their genome 
15 artificially and genetically altered by the insertion of a nucleic acid according to the invention. In one 
embodiment, the invention encompasses non-human host mammals and animals comprising a 
recombinant vector of the invention or a GENSET gene disrupted by homologous recombination with a 
knock out vector. 

The transgenic animals of the invention all include within a plurality of their cells a cloned 
20 recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic acids 
comprising a GENSET coding sequence, a GENSET regulatory polynucleotide, a polynucleotide 
construct, or a DNA sequence encoding an antisense polynucleotide such as described in the present 
specification. 

Generally, a transgenic animal according the present invention comprises any of the 
25 polynucleotides, the recombinant vectors and the cell hosts described in the present invention. In a first 
preferred embodiment, these transgenic animals may be good experimental models in order to study the 
diverse pathologies related to the dysregulation of the expression of a given GENSET gene, in particular 
the transgenic animals containing within their genome one or several copies of an inserted polynucleotide 
encoding a native GENSET protein, or alternatively a mutant GENSET protein. 
30 In a second preferred embodiment, these transgenic animals may express a desired polypeptide of 

interest under the control of the regulatory polynucleotides of the GENSET gene, leading to high yields in 
the synthesis of this protein of interest, and eventually to tissue specific expression of the protein of 
interest. 
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In a third preferred embodiment, these transgenic animals may express a desired polypeptide of 
interest fused to a GENSET signal peptide sequence, leading to the secretion of the fusion (chimeric) 
polypeptide. 

The design of the transgenic animals of the invention may be made according to the conventional 
5 techniques well known from the one skilled in the art. For more details regarding the production of 
transgenic animals, and specifically transgenic mice, it may be referred to US Patents Nos 4,873,191, 
issued Oct. 10, 1989; 5,464,764 issued Nov 7, 1995; and 5,789,215, issued Aug 4, 1998; these documents 
being herein incorporated by reference to disclose methods producing transgenic mice. 

Transgenic animals of the present invention are produced by the application of procedures which 

10 result in an animal with a genome that has incorporated exogenous genetic material. The procedure 
involves obtaining the genetic material which encodes either a GENSET coding sequence, a GENSET 
regulatory polynucleotide or a DNA sequence encoding a GENSET antisense polynucleotide, or a portion 
thereof, such as described in the present specification. A recombinant polynucleotide of the invention is 
inserted into an embryonic or ES stem cell line. The insertion is preferably made using electroporation, 

15 such as described by Thomas et al. (1987), which disclosure is hereby incorporated by reference in its 
entirety. The cells subjected to electroporation are screened (e.g. by selection via selectable markers, by 
PCR or by Southern blot analysis) to find positive cells which have integrated the exogenous recombinant 
polynucleotide into their genome, preferably via an homologous recombination event. An illustrative 
positive-negative selection procedure that may be used according to the invention is described by 

20 Mansour et al. (1988), which disclosure is hereby incorporated by reference in its entirety. 

The positive cells are then isolated, cloned and injected into 3.5 days old blastocysts from mice, 
such as described by Bradley (1987), which disclosure is hereby incorporated by reference in its entirety. 
The blastocysts are then inserted into a female host animal and allowed to grow to term. Alternatively, 
the positive ES cells are brought into contact with embryos at the 2.5 days old 8-16 cell stage (morulae) 

25 such as described by Wood et al. (1993), or by Nagy et al. (1993), which disclosures are hereby 
incorporated by reference in their entireties, the ES cells being internalized to colonize extensively the 
blastocyst including the cells which will give rise to the germ line. 

The offspring of the female host are tested to determine which animals are transgenic e.g. include 
the inserted exogenous DNA sequence and which ones are wild type. 

30 Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a 

recombinant expression vector or a recombinant host cell according to the invention. 

Recombinant Cell Lines Derived From The Transgenic Animals Of The Invention: 
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A further object of the invention comprises recombinant host cells obtained from a transgenic 
animal described herein. In one embodiment the invention encompasses cells derived from non-human 
host mammals and animals comprising a recombinant vector of the invention or a GENSET gene 
disrupted by homologous recombination with a knock out vector. 
5 Recombinant cell lines may be established in vitro from cells obtained from any tissue of a 

transgenic animal according to the invention, for example by transfection of primary cell cultures with 
vectors expressing cwogenes such as SV40 large T antigen, as described by Chou (1989), and Shay et al. 
(1991), which disclosures are hereby incorporated by reference in their entireties. 

Uses of pol ypeptides of the invention 

10 Proteins containing multimerization domains 

The invention relates to compositions and methods using proteins of the invention containing a 
multimerization domains such as a leucine zipper or a helix loop helix domain. 

Proteins of the invention containing a leucine zipper domain, are herein referred to as LZP, such 
as the ones described in this section and those containing a leucine zipper domain as shown on Table VI, 
15 or parts thereof, preferably fragments comprising a leucine zipper domain, or derivative thereof to 
mediate multimerization of proteins of interest. 

The leucine zipper consists of a periodic repetition of leucine residues at every seventh, covering 
a distance spanning eight helical turns. The segments containing these periodic arrays of leucine residues 
appear to exist in an alpha-helical conformation, and the leucine side chains extending from one alpha- 
20 helix interact with those from a similar alpha helix of a second polypeptide, facilitating dimerization. The 
structure formed by cooperation of these two regions forms a coiled coil (O'Shea E.K., Rutkowski R., 
Kim P.S. Science 243:538-542.,1989). 

Leucine-zippers contribute to targeting of various proteins (eg. glucose transporters, Asano, et al., 
J. Biol Chem., 267, 19636-19641 (1992)) and permit dimerization of various cytoplasmic hormone 
25 receptors and enzymes (Forman, et al., Mol Endocrinol, 3, 1610-1626 (1989)). Leucine zippers are also a 
common feature of protein transcription factors, where they permit homo- or heterodimerization resulting 
in tight binding to DNA strands (for reviews, see Abel, et al., Nature 341, 24-25 (1989); Jones, et al., Cell 
61, 9-1 1 (1990); Lamb, et al., Trends in Biochemical Sciences 16, 417-422 (1991)). 

Leucine zippers have been shown to be useful tools in several areas of biotechnology , especially 
30 in protein engineering, where their ability to mediate homo-dimerization or hetero-dimerization has found 
several applications. For example, Bosslet et al have described the use of a pair of leucine zipper for in 
vitro diagnosis, in particular for the immunochemical detection and determination of an analyte in a 
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biological liquid ( US patent 5,643,731) / Tso et al have used leucine zippers for producing bispecific 
antibody heterodimers (US patent 5,932,448) / Methods of preparing soluble oligomeric proteins using 
leucine zippers have been described by Conrad et al (US patent 5,965,712), Ciardelli et al (US patent 
5,837,816), Spriggs et al (WO9410308) / Leucine zipper forming sequences have been used by Pelletier 
5 et al in protein fragment complementation assays to detect biomolecular interactions (WO9834120). 
Because of their usefulness in biotechnology, it is thus highly interesting to isolate new leucine zipper 
domains. 

The multimerization activity of proteins containing leucine zipper domains may be assayed using 
any of the assays known to those skilled in the art including circular dichroism spectrum and thermal 

10 melting analyses as described in US patent 5,942,433. Alternatively, the leucine zipper motif in LZP 
could be used by those skilled in art as a "bait protein" in a well established yeast double hybridization 
system to identify its interacting protein partners in vivo from cDNA library derived from different tissues 
or cell types of a given organism. Alternatively, LZP or part thereof could be used by those skilled in art 
in mammalian cell transfection experiments. When fused to a suitable peptide tag such as [His] 6 tag in a 

15 protein expression vector and introduced into culture cells, this expressed fusion protein can be 
immunoprecipitated with its potential interacting proteins by using anti-tag peptide antibody. This 
method could be chosen either to identify the associated partner or to confirm the results obtained by 
other methods such as those just mentioned. 

In a preferred embodiment, the invention relates to compositions and methods of using LZP or 

20 part thereof for preparing soluble multimeric proteins, which consist in multimers of fusion proteins 
containing a leucine zipper fused to a protein of interest, using any technique known to those skilled in 
the art including those described in international patent WO9410308, which disclosure is hereby 
incorporated by reference in its entirety. In another preferred embodiment, LZP or derivative thereof is 
used to produce bispecific antibody heterodimers as described in US patent 5,932,448, which disclosure is 

25 hereby incorporated by reference in its entirety. Briefly, leucine zippers capable of forming heterodimers 
are respectively linked to epitope binding components with different specificities. Bispecific antibodies 
are formed by pairwise association of the leucine zippers, forming an heterodimer which links two 
distinct epitope binding components. In still another preferred embodiment, LZP or part thereof or 
derivative thereof is used for detection and determination of an analyte in a biological liquid as described 

30 in US patent 5,643,731, which disclosure is hereby incorporated by reference in its entirety. Briefly, a 
first leucine zipper is immobilized on a solid support and the second leucine zipper is coupled to a 
specific binding partner for an analyte in a biological fluid. The two peptides are then brought into 
contact thereby immobilizing the binding partner on the solid phase. The biological sample is then 
contacted with the immobilized binding partner and the amount of analyte in the sample bound to the 
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binding partner determined. In still another preferred embodiment, the LZP or part thereof may be used 
to synthesize novel nucleic acid binding proteins which are able to multimerize with proteins of interest, 
for example to inhibit and/or control cellular growth using any genetic engineering technique known to 
those skilled in the art including the ones described in the US patent 5,942,433, which disclosure is 
5 hereby incorporated by reference in its entirety . 

In another embodiment, the invention relates to compositions and methods using the LZP or part 
thereof or derivative thereof in protein fragment complementation assays to detect biomolecular 
interactions in vivo and in vitro as described in international patent WO9834120, which disclosures is 
hereby incorporated by reference in its entirety. Such assays may be used to study the equilibrium and 

10 kinetic aspects of molecular interactions including protein-protein, protein-nucleic acid, protein- 
carbohydrate and protein-small molecule interactions, for screening cDNA libraries for binding to a target 
protein with unknown proteins or libraries of small organic molecules for biological activity. 

Still, another object of the present invention relates to the use of the LZP or part thereof for 
identifying new leucine zipper domains using any techniques for detecting protein-protein interaction 

15 known to those skilled in the art. Among the traditional methods which may be employed are co- 
immunoprecipitation, crosslinking and co-purification through gradients or chromatographic columns of 
cell lysates. Once isolated as a protein interacting with the LZP, such an intracellular protein can be 
identified (e.g. its amino acid sequence determined) and can, in turn, be used, in conjunction with 
standard techniques, to identify other proteins with which it interacts. The amino acid sequence thus 

20 obtained may be used as a guide for the generation of oligonucleotide mixtures that can be used to screen 
for gene sequences encoding such intracellular proteins. Screening may be accomplished, for example, by 
standard hybridization or PCR techniques. Techniques for the generation of oligonucleotide mixtures and 
the screening are well-known. (See, e.g., Ausubel et al., eds., Current Protocols in Molecular Biology, 
J.Wiley and Sons (New York, NY 1993) and PR Protocols: A Guide to Methods and Applications, 1990, 

25 Innis, M. et al., eds. Academic Press, Inc., New York ). 

Alternatively, methods may be employed which result in the simultaneous identification of genes 
which encode the intracellular proteins that can dimerize with the LZP or part thereof using any technique 
known to those skilled in the art. These methods include, for example, probing cDNA expression 
libraries, in a manner similar to the well known technique of antibody probing of lambda.gtll libraries, 

30 using as a probe a labeled version of the LZP or part thereof, or fusion protein, e.g., the LZP or part 
thereof fused to a marker (e.g., an enzyme, fluor, luminescent protein, or dye), or an Ig-Fc domain (for 
technical details on screening of cDNA expression libraries, see Ausubel et al, supra). Alternatively, 
another method for the detection of protein interaction in vivo, the two-hybrid system, may be used. 
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Protein of SEQ ID NO: 261 (internal designation 1 16-054-3-0-E6-CS) 

The 233 amino acids protein of SEQ ID NO: 261 encoded by the cDNA of SEQ ID NO: 20 
displays two leucine zipper sites at positions 142-163 and 170-191. 

It is believed that the protein of SEQ ID NO: 261 is able to dimerize either with itself (homo- 
5 dimerisation) or with an heterologous protein (hetero-dimerisation) of interest, through the mediation of 
its leucine zipper domain. Preferred polypeptides of the invention are polypeptides comprising fragments 
of SEQ ID NO: 261 from position 142-163 and 170-191, and fragments having any of the biological 
activities described herein. 

Protein of SEQ ID NO: 263 (internal designation 1 16-05 5-2-0-F7-CS) 

10 The protein of SEQ ID NO: 263 encoded by the cDNA of SEQ ID NO: 22 displays a leucine 

zipper pattern situated near its its NH2 terminal part (position 15 to 36). 

It is believed that the protein of SEQ ID NO: 263 is able to dimerize either with itself (homo- 

dimerisation) or with an heterologous protein (hetero-dimerisation) of interest, through the mediation of 

its leucine zipper domain. Preferred polypeptides of the invention are polypeptides comprising fragments 
15 of SEQ ID NO: 263 from position 15 to 36, and fragments having any of the biological activities 

described herein.. 

Protein ofSEQIDNO:245 (internal designation 105-026-1 -0-A5-CS) 

The protein of SEQ ID NO:245 encoded by the cDNA of SEQ ID NO:4 displays a leucine zipper 
pattern situated near its COOH terminal part (position 371 to 392). 
20 It is believed that the protein of SEQ ID NO: 245 is able to dimerize either with itself (homo- 

dimerisation) or with an heterologous protein (hetero-dimerisation) of interest, through the mediation of 
its leucine zipper domain. Preferred polypeptides of the invention are polypeptides comprising fragments 
of SEQ ID NO: 245 from position 371 to 392, and fragments' having any of the biological activities 
described herein. 

25 Protein of SEQ ID NO: 257 (internal designation 106-043-4-0-H3-CS) 

The 265-amino-acid-long protein of SEQ ID: 257 encoded by the cDNA of SEQ ID NO: 16 exhibits 
homology to the Homo sapiens hypothetical protein (Genbank accession number AJ278482). These two 
proteins are probably the result of an alternative splicing. 

The protein of SEQ ID NO: 257 displays a leucine zipper pattern situated from position 155 to 
30 176. Thus, it is believed that the protein of SEQ ID NO: 257 is able to dimerize either with itself (homo- 
dimerisation) or with an heterologous protein (hetero-dimerisation) of interest, through the mediation of 
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its leucine zipper domain. Preferred polypeptides of the invention are polypeptides comprising leucine 
zipper domains fragments and fragments having any of the biological activities described herein. 

Protein ofSEQIDNO: 314 (internal designation 188-4 1-1 -0-B8-CS.cor) 

A growing number of proteins have been shown to undergo post-translational modification by 
5 fatty acids that are covalently linked to cysteine residues through a thioester bond. Fatty acid 
modifications contribute to intracellular protein localization by facilitating membrane binding and also by 
strengthening protein-protein interactions. Cycles of palmitoylation and depalmitoylation have been 
described for a number of intracellular proteins, but the relevant enzymes that catalyze these processes 
have yet to be fully characterized and the full significance of these cycles remains to be elucidated. 

10 Palmitoyl-protein thioesterase-1 (PPT1) is a lysosomal hydrolase that removes long-chain fatty 

acyl groups from modified cysteine residues in proteins. Mutations in PPT1 have been found to cause the 
infantile form of neuronal ceroid lipofuscinosis (INCL). 

Soyombo and Hofmann (J. Biol. Chem. 272: 27456-27463 [1997]) identified cDNAs encoding 
PPT2. The deduced PPT2 protein contains 302 amino acids, including a 27-amino acid leader peptide, a 

15 sequence motif characteristic of many thioesterases and lipases, and 5 potential N-linked glycosylation 
sites. PPT2 shares 18% amino acid identity with PPT1. Soyombo and Hofmann tentatively localized the 
human PPT2 gene to 6p21.3. Northern blot analysis detected a predominant 2.0-kb PPT2 transcript in the 
human tissues examined, with the highest expression in skeletal muscle; variable amounts of 2.8- and 7.0- 
kb transcripts were also observed. 

20 Cell fractionation studies indicate that PPT2 is present in the lysosomal fraction. Immunoblot 

analysis of recombinant PPT2 expressed in mammalian cells showed 6 PPT2 proteins ranging in size 
from 31 to 42 kDa. Treatment that removes asparagine-linked oligosaccharides resulted in a single major 
protein of 3 1 kDa and a minor protein of 33 kDa. 

Recombinant PPT2, like PPT1, possesses thioesterase activity and localizes to the lysosome. 

25 Since PPT2 could not substitute for PPT1 in correcting the metabolic defect in INCL cells and was unable 
to remove palmitate groups from palmitoylated proteins, it appears that PPT2 possesses a different 
substrate specificity than PPT1. Another study, however, was able to show, after expression of the 
recombinant protein in a baculovirus system and using cell lysate as substrate, that the protein had S- 
thioesterase activity with a preference for acyl groups palmitic and myristic acid. 

30 The subject invention provides the protein/polypeptide of SEQ ID NO:3 14, encoded by the cDNA 

of SEQ ID NO:73. The invention also provides biologically active fragments of SEQ ID NO:3 14. In one 
embodiment, the polypeptides of SEQ ED NO:314 are interchanged with the corresponding polypeptide 
encoded by the human cDNA of clone 188-41-1-0-B8-CS. "Biologically active fragments" are defined as 
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those peptide or polypeptide fragments having at least one of the biological functions of the full length 
protein (e.g., removal of long-chain fatty acyl groups from modified cysteine residues in proteins). 
Compositions of the protein/polypeptide of SEQ ID NO:314, or biologically active fragments thereof, are 
also provided by the subject invention. These compositions may be made according to methods well 
5 known in the art. 

The invention also provides variants of the protein of SEQ ID NO:314. These variants have at 
least about 80%, more preferably at least about 90%, and most preferably at least about 95% amino acid 
sequence identity to the amino acid sequence encoded by SEQ ID NO:73. Variants according to the 
subject invention also have at least one functional or structural characteristic of the protein of SEQ ID 

10 NO:314. The invention also provides biologically active fragments of the variant proteins. Compositions 
of variants, or biologically active fragments thereof, are also provided by the subject invention. These 
compositions may be made according to methods well known in the art. Unless otherwise indicated, the 
methods disclosed herein can be practiced utilizing the protein encoded by SEQ ID NO:73, biologically 
active fragments of SEQ ID NO:314, variants of SEQ ID NO:314, and biologically active fragments of 

15 the variants. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 
the amino acid sequence of SEQ ID NO:314. In a preferred embodiment, SEQ ID NO:314 is encoded by 
clone 188-41 -1-0-B8-CS or the cDNA of SEQ ID NO:73. It is well within the skill of a person trained in 
the art to create these alternative DNA sequences which encode proteins having the same, or essentially 

20 the same, amino acid sequence. These variant DNA sequences are, thus, within the scope of the subject 
invention. As used herein, reference to "essentially the same" sequence refers to sequences that have 
amino acid substitutions, deletions, additions, or insertions that do not materially affect biological 
activity. Fragments retaining one or more characteristic biological activity of the protein encoded by 
clone 1 88-41 -1-0-B8-CS are also included in this definition. 

25 In one aspect of the subject invention, SEQ ID NO:314, and variants thereof, can be used to 

generate polyclonal or monoclonal antibodies. Both biologically active and immunogenic fragments of 
SEQ ID NO:314, or variant proteins, can be used to produce antibodies. Polyclonal and/or monoclonal 
antibodies can be made according to methods well known to the skilled artisan. Antibodies produced in 
accordance with the subject invention can be used in a variety of detection assays known to those skilled 

30 in the art. 

SEQ ID NO:314 can be used as a marker for identification of lysosome dysfunction in 
individuals. In this aspect of the subject invention, antibodies specific for SEQ ID NO:314, or fragments 
thereof, are used in routine immunoassays to screen for the presence or absence of SEQ ID NO:314, or 
fragments thereof, in samples containing lysosomal contents. The presence or absence of the protein of 
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SEQ ID NO:314 can be used to provide an indication of lysosomal function and is, thus, useful for 
diagnostic/prognostic identification of lysosomal dysfunction. 

The subject invention also provides materials and methods for the screening of individual 
samples for the presence or absence of nucleic acids encoding the protein of SEQ ID NO:314, or variants 
5 thereof. In one embodiment, nucleic acids are provided for hybridization assays, known to those skilled 
in the art, of mRNA or cDNA. The hybridization assays are performed upon nucleic acid samples 
obtained, or derived from, an individual with suspected lysosomal dysfunction. The hybridization assays 
screen for the presence or absence of nucleic acids encoding SEQ ID NO:314, or variants thereof. The 
presence or absence of such nucleic acids can be used as a predictive/prognostic indicator of disease state 

10 or lysosome function. 

Nucleic acids of the invention can also be used in gene replacement or gene therapy protocols. 
This aspect of the subject invention nucleic acids encoding SEQ ID NO:314, or biologically active 
fragments thereof, can be introduced into cells and implanted into an individual with lysosomal disorders. 
In one embodiment, genetically engineered macrophage can be used for the treatment regimen (see, for 

15 example, Eto and Ohashi [2000] J. Inherit. Metabol. Dis. 23:293-298). Alternatively, autologous cells 
may be obtained from an individual, transformed with nucleic acid ex vivo, expanded ex vivo, and 
reintroduced into the individual. Such methods are well known to the skilled artisan. 

Protein of SEQ ID NO: 280 (internal designation 160-75-4-0-A9-CS): 

The protein of SEQ ID NO:280, encoded by the cDNA of SEQ ID NO:39 and expressed in the 
20 fetal brain, is a chromosome 12 paralog of C7orf2, a human protein described as a transmembrane 
receptor located on chromosome 7 (Heus, H. C, A. Hing, et al. (1999) Genomics 57(3): 342-51). In 
addition, this protein is an ortholog of the murine gene LMBR1L, found to be involved in polydactily in 
mice (Clark, R. M, P. C. Marker, et al. (2000) Genomics 67(1): 19-27). A high level of homology was 
also found with a gene identified in Fugu rubripes (AF056116), as well as with C. Elegans R05D3.2 
25 (Gellner, K. and S. Brenner (1999) Genome Res 9(3): 251-8). 

The 362-amino-acid-long protein of SEQ ID NO:280, encoded by the cDNA of SEQ ED NO:39 
is a splice variant of Z64989, located on chromosome 12. The chromosome 12 gene has 6 known variants 
described in entries AK001356 and AK001651 in genbank and entries A26354, A26375, X27360 and 
Z64989 in geneseqn. The closest sequence is Z64989, either at the nucleotide or the protein level. 
30 Z64989 is split into 17 exons, of which the protein of the invention contains the last 14. The transcription 
start site of the cDNA of SEQ ID NO:39 lies within the third intron of Z64989, and the protein of the 
invention starts at position 128 of Z64989. In addition, 2 potential leucine zippers are present in the 
protein of the invention (positions 136-157 and 272-293). 
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Preaxial Polydactyly is a congenital hand malformation that includes duplicated thumbs, various 
forms of triphalangeal thumbs, and duplications of the index finger. Clark et al. (supra) demonstrated the 
correspondence between the spatial and temporal changes in Lmbrl expression and the embryonic onset 
of Polydactyly mutant phenotype, suggesting that a downregulation of Lmbrl results in polydactily. It is 
5 likely that the Lmbrl gene is involved in the patterning of limbs during mammalian development, for 
example by receiving and transducing a locally secreted ligand in the developing limb. 

It is believed that the protein of SEQ ID NO:280 is a paralog of human C7orf2, and is thus a 
membrane bound protein implicated in the patterning of the mammalian body plan during early 
development. For example, the protein of the invention may be involved in organizing limb development, 

10 as well as in the development of the fetal brain. As such, the activity of the present protein likely 
influences various cellular processes, including gene expression, cellular growth and proliferation, as well 
as cellular differentiation. In addition, leucine zippers within the present protein render the protein 
capable of undergoing specific protein-protein interactions with other leucine-zipper containing proteins, 
including with itself (i.e. homodimerization). Preferred polypeptides of the invention are fragments of 

15 SEQ ID NO:280 having any of the biological activities described herein. 

In one embodiment of the present invention, the present protein can be used to identify cells of the 
fetal brain. For example, the protein of the invention or part thereof may be used to synthesize specific 
antibodies using any technique known to those skilled in the art. Such tissue-specific antibodies may then 
be used to identify tissues of unknown origin, such as in forensic samples, differentiated tumor tissue that 

20 has metastasized to foreign bodily sites, etc., or to differentiate different tissue types in a tissue cross- 
section using immunochemistry. In addition, labeled reagents that can specifically bind to the protein of 
the invention can be used to visualize cell membranes and the components of the secretory pathway in 
cells, e.g. the ER and Golgi. 

In another embodiment of the present invention, the present protein can be used to diagnose 

25 developmental abnormalities, or the potential for such abnormalities, e.g. in a fetus or in adults to 
determine (i.e. to determine if they are a carrier of a mutant copy of the gene). Individuals found to carry 
one or two mutant copies of the present gene would be candidates for, e.g. gene therapy or other strategies 
to correct or compensate for the gene deficiency, or for strategies to ensure that their children would not 
be carriers of the mutated gene. The characterization of mutations in genes encoding the present protein 

30 would also be of great value in understanding the nature of Polydactyly and other developmental 
disorders, thereby facilitating the development of other strategies for treating and preventing these 
disorders. 

In another embodiment, the present protein is used to modulate gene expression, cell growth and 
proliferation, and/or cell differentiation in cells in vitro or in vivo. For example, any of these behaviors 
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can be increased or inhibited in cells grown in vitro, e.g. for protein production or for ex vivo therapeutic 
strategies. In addition, any disease associated with an increase or decrease in any of these cellular 
behaviors in vivo can be treated or prevented by enhancing or inhibiting the expression or activity of the 
protein of the invention in cells in vivo. 

5 Proteins ofSEQ ID NOs: 309 and 304 (internal designations 188-1 1-1-0-B3-CS and 187-34-0-0-112-CS) 

The proteins of SEQ ID NOs: 309 and 304 are encoded by the cDNAs of SEQ ID NOs: 68 and 
63. Accordingly, it will be appreciated that all characteristics and uses of the polypeptides of SEQ ID 
NOs: 309 and 304 described throughout the present application also pertain to the polypeptides encoded 
by human cDNA of clones 188-1 1-1-0-B3-CS and 187-34-0-0-112-CS. In addition, it will be appreciated 
10 that all characteristics and uses of the nucleic acids of SEQ ED NOs: 68 and 63 described throughout the 
present application also pertain to the nucleic acids of the human cDNAs of clones 188-1 1-1 -0-B3-CS 
and 187-34-0-0-112-CS. 

The protein of SEQ ID NO: 309 (encoded by the clone having internal designation number 188- 
11-1-0-B3-CS) and the polymorphic variant thereof of SEQ ID NO: 304 (encoded by the clone having 

15 internal identification number 187-34-0-0-112-CS and which differs from the polypeptide encoded by the 
clone having internal designation number 188-1 1-1-0-B3CS at a single amino acid), are highly 
homologous to the first 279 amino acids of the LGI1 (Leucine-rich gene - Glioma Inactivated) protein. 
Clones 188-1 1-1-0-B3-CS and 187-34-0-0-112-CS appear to be splicing and polymorphic variants of 
LGI1. The LGI1 protein is 557 amino acid in length. (See Somerville et al., (2000) Mammalian 

20 Genome 11, 622-627 ; Chernova, et al. (1998) Oncogene 17, 2873-2881, the disclosures of which are 
incorporated herein by reference in their entireties). Clone 188-1 1-1 -0-B3-CS align with the first 279 
amino acids of LGI1, followed by the addition of 12 amino acids (VLREEHRFTNMS) to the C-terminal 
end which do not appear to be homologous to LGI1. Like LGI1, clone 188-1 1-1-0-B3-CS and the 
polymorphic variant 187-34-0-0-112-CS contain the LRR domain and are highly expressed in brain tissue. 

25 LGI1 belongs to a large family of leucine-rich repeat (LRR) proteins. It is believed that the LRR 

domains act as a region of protein-protein interaction. This has been substantiated as the family of known 
LRR proteins has grown. Leucine-rich repeats have been identified as essential components in 
glycoprotein hormone receptors, proteoglycans and the Trk proteins by expression of mutants and 
artificial chimaeras in tissue culture and by biochemical analysis of the properties of these constructs. 

30 Many transmembrane LRR proteins are known or suspected to encode truncated forms (N and L 6 , and slit 
for example) with functional significance. The proteoglycan Decorin, a secreted protein, binds TGF-(3, a 
growth factor which stimulates decorin expression. Since decorin inhibits growth of cultured cells, it may 
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form part of a negative feedback loop to regulate cell growth. This is similar to the proposed function of 
the LGI1 receptor protein. 

Analysis of brain gliomas has revealed that LGI1 expression is either abolished or greatly reduced 
in high-grade tumors compared with more benign ones, indicating a role as a tumor suppressor gene 
5 (Cowell et al. 2000; Cowell et al. 1998, the disclosure of which is incorporated herein by reference in its 
entirety). Most glioblastoma multiforme (GBM) brain tumors contain only one genomic copy of LGI1, 
and this one is almost invariably not expressed. How the gene is inactivated is not clear, although one 
possibility is that chromosome or gene rearrangement, which occur in 20-25% of tumors, cause 
inactivation as a result of a positional effect. Recently it was determined that the LGI1 gene is located on 

10 10q24, and is disrupted by translocation in the T98G GBM cell line and is also rearranged in over 26% of 
primary brain tumors. Alternatively, LGI1 may be part of a highly regulated pathway where inactivation 
of other key members or high specific transcription factors results in either inactivation of all genes in the 
pathway or a failure to initiate transcription. 

Since functional inactivation of LGI1 occurs during the transition of low-grade to high-grade 

15 brain tumors, knockout or transgenic mice in which the expression of the protein of SEQ ID NO: 309 or 
304 has been reduced, eliminated or altered may be used as disease model. In particular, mice that 
overexpress LGI1 may be used as a tumorigenesis model. 

Mice are particularly useful as models for assessing the consequences of altering the level or 
activity of the proteins of SEQ ID NO:309 or 304 or to identify agents useful in treaating tumorigenesis, 

20 since human and mouse LGI1 are highly conserved, showing 91% identity at the nucleotide level and 
97% similarity at the amino acid level, with most of the amino acid substitutions being conservative. The 
mouse lgil gene is 4.2 kb in length, while the human LGI1 is 2.2 kb in length. This difference in size 
between the human and mouse gene is a result of the inclusion of a 2 kb sequence in the 3' untranslated 
region in the mouse gene. Whether the additional sequence affects gene expression is not clear. Further 

25 analysis of the genomic sequence reveals that the number of exon/intron boundaries is also similar in 
humans and mice. The high degree of LGI1 conservation between mice and humans implies that this 
gene has experienced a strong selection pressure. It is intriguing to speculate that any major deviations in 
the primary protein sequence may result in a loss of function of this gene product. Total or partial loss of 
the LGI1 gene function could, therefore, be lethal, which in turn implies that LGI1 plays an important role 

30 in normal brain development as well as in tumor formation. 

SEQ ID NOs:309 and 304 also have high homology with Slit, a secreted Drosophila protein 
which plays a role in the development of axon pathway development in the central nervous system. The 
Slit protein is necessary for the normal development of the midline on the CNS, particularly the midline 
glial cells, and for the concomitant formation of the commissural axon pathway. The process is 
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dependent on the level of Slit protein expression. It appears that the Slit protein is excreted by the midline 
glial cells, where it is synthesized and is eventually associated with the surface axons that traverse them. 
Contact of cells with supernatant expressing the product of this gene increases the permeability of THP-1 
monocyte cells to calcium. Thus, it is likely that Slit is involved in a signal transduction pathway that is 
5 initiated when Slit protein binds a receptor on the surface of the monocyte cell. 

In view of the above, it is believed that the proteins of SEQ ID NOs:309 and 304 are involved in a 
signal transduction pathway mediated through a receptor that modulates the differentiation and/or 
proliferation of cells. 

Northern blot analysis detects LGI1 transcripts only in brain, neural tissue, and skeletal muscle 

10 but not in heart, kidney, lung, placenta, liver, or pancreas. Northern blot analysis of RNA derived from 
several different regions of human brain revealed a widespread expression of LGI1 although with 
different intensities. The highest abundance was found in cerebral cortex, hippocampus, and putamen. 
The lowest expression was detected in corpus callosum. The levels of expression were intermediate in 
the other brain regions. Accordingly, the proteins of SEQ ID NOs:309 or 304 or fragments thereof, as 

15 well as polynucleotides encoding the proteins of SEQ ID NOs:309 or 304, may be used to determine 
whether a tissue sample is derived from brain (and in particular cerebral cortex, hippocampus, or 
putamen), neural tissue, and skeletal tissue or to distinguish whether a tissue sample is derived from brain 
or another tissue, such as heart, kidney, lung, placenta, liver, or pancreas. 

Accordingly, the present invention includes the use of the protein of SEQ ID NOs: 309 or 304, 

20 fragments comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive 
amino acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition, 
such as those listed above, in an individual. In such embodiments, the protein of SEQ ID NO:309 or 304, 
or a fragment thereof, is administered to an individual in whom it is desired to increase or decrease any of 
the activities of the protein of SEQ ID NO:309 or 304, including tumor suppression, modulation of neural 

25 development or involvement in brain tumors, glioblastoma multiforme, brain injuries, neurodegenerative 
disease states and behavioral disorders such as Alzheimers Disease, Parkinsons Disease, epilepsy, 
multiple sclerosis, Huntingtons Disease, schizophrenia, obsessive compulsive disorders, and in the 
processes of nerve regeneration in spinal cord injury, stroke, facial nerve damage, diabetes caused nerve 
damage, and retinal regeneration. 

30 The protein of SEQ ID NO:309 or 304 or a fragment thereof may be administered directly to the 

individual or, alternatively, a nucleic acid encoding the protein of SEQ NO:309 or 304 or a fragment 
thereof may be administered to the individual. Alternatively, an agent which increases the activity of the 
protein of SEQ ID NO:309 or 304 may be administered to the individual. Such agents may be identified 
by contacting the protein of SEQ NO:309 or 304 or a cell or preparation containing the protein of SEQ ID 
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NO:309 or 304 with a test agent and assaying whether the test agent increases the activity of the protein. 
For example, the test agent may be a chemical compound or a polypeptide or peptide. 

Alternatively, the activity of the protein of SEQ ID NO:309 or 304 may be decreased by 
administering an agent which interferes with such activity to an individual. Agents which interfere with 
5 the activity of the protein of SEQ ID NO:309 or 304 may be identified by contacting the protein or a cell 
or preparation containing the with a test agent and assaying whether the test agent decreases the activity 
of the protein. For example, the agent may be a chemical compound, a polypeptide or peptide, an 
antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix-forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 

10 invention or part thereof as a marker protein to selectively identify tissues, preferably brain, or to 
distinguish between two or more possible sources of a tissue sample on the basis of the level of the 
protein of SEQ ED NO:309 or 304 in the sample. For example, the protein of SEQ ID NO:309 or 304 or 
fragments thereof may be used to generate antibodies using any techniques known to those skilled in the 
art, including those described therein. Such tissue-specific antibodies may then be used to identify tissues 

15 of unknown origin, for example, forensic samples,differentiated tumor tissue that has metastasized to 
foreign bodily sites, or to differentiate different tissue types in a tissue cross-section using 
immunochemistry. In such methods a tissue sample is contacted with the antibody, which may be 
detectably labeled, under conditions which facilitate antibody binding. The level of antibody binding to 
the test sample is measured and compared to the level of binding to control cells from brain or tissues 

20 other than brain to determine whether the test sample is from brain. Alternatively, the level of the protein 
of SEQ ID NO.309 or 304 in a test sample may be measured by determining the level of RNA encoding 
the protein of SEQ ID NO:309 or 304 in the test sample. RNA levels may be measured using nucleic acid 
arrays or using techniques such as in situ hybridization, Northern blots, dot blots or other technques 
familiar to those skilled in the art. If desired, an amplification reaction, such as a PCR reaction, may be 

25 performed on the nucleic acid sample prior to analysis. The level of RNA in the test sample is compared 
to RNA levels in control cells from brain or tissues other than brain to determine whether the test sample 
is from brain. For a number of disorders listed above, particularly of the nervous system, expression of 
the genes encoding the polyepeptide of SEQ ID NO:309 or 304 at significant higher or lower levels may 
be routinely detected in certain tissues or cell types (e.g., cancerous and wounded tissues) or bodily fluids 

30 (e.g., serum, plasma, synovial fluid, and spinal fluid) or another tissue of cell sample taken from an 
individual having such a disorder, relative to the standard gene expression level, i.e., the expression level 
in healthy tissue or bodily fluid from an individual not having the disorder. 

In another embodiment, antibodies to the protein of SEQ ID NO: 3 09 or 304 or part thereof may 
be used for detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:309 or 
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304, including using methods known to those skilled in the art. For example, an antibody against the 
protein of SEQ ID NO:309 or 304 or a fragment thereof may be fixed to a solid support, such as a 
chromatograpy matrix. A preparation containing cells expressing the protein of SEQ ED NO:309 or 304 
is placed in contact with the antibody under conditions which facilitate binding to the antibody. The 
5 support is washed and then the cells are released from the support by contacting the support with agents 
which cause the cells to dissociate from the antibody. 

In another embodiment of the present invention, the protein of SEQ ID NO:309 or 304 or a 
fragment thereof may be used to diagnose disorders associated with altered expression of the protein of 
SEQ ID NO:309 or 304. In some embodiments, the protein of SEQ ID NO:309 or 304 or fragments 
10 thereof may be used to diagnose cancer. In such techniques, the level of the protein of SEQ ID NO:309 
or 304 in an ill individual is measured using techniques such as those described herein and compared to 
the level in normal individuals. For example, a decreased level of the protein of SEQ ID NO:309 or 304 
relative to normal individuals suggests that the ill individual may suffer from cancer or be predisposed to 
getting cancer in the future. 

15 Another embodiment of the present invention is a polypeptide comprising a structural or 

functional domain of the protein of SEQ ID NO:309 or 304. Such structural or functional domains of the 
protein of SEQ ID NO: 309 or 304 include a leucine rich repeat C-terminal domain located between amino 
acid positions 173 and 222, a leucine rich repeat located between amino acid positions 92 and 115, a 
leucine rich repeat located between amino acid positions 116 and 139, a leucine rich repeat located 

20 between amino acid positions 140 and 163, a leucine rich repeat located between amino acid positions 
164 and 185, a membrane spanning segment located between amino acid positions 15 and 35, and a signal 
peptide comprising the sequence FLCLLSALLLTEG/KK. 

Accordingly, the protein of SEQ ID NO:309 or 304 or fragments thereof, or polynucleotides 
encoding these proteins or fragments, may be used in in vitro diagnostic assays for malignant brain 

25 tumors, such as glioblastoma muliforme. These proteins or nucleic acids may also be used in the 
attenuation / prevention and/or treatment of brain tumors and/or brain injuries, of neurodegenerative 
disease states and behavioral disorders such as Alzheimers Disease, Parkinsons Disease, epilepsy, 
multiple sclerosis, Huntingtons Disease, schizophrenia, obsessive compulsive disorders, and in the 
processes of nerve regeneration in spinal cord injury, stroke, facial nerve damage, diabetes caused nerve 

30 damage, and retinal regeneration. 

In addition, the protein, as well as, antibodies directed against the protein, and relevant small 
molecules may be used as tumor markers and /or immunotherapy targets for the above disease states. For 
example, antibodies directed against amino acids VLREIHRFTNMS of both clones may aid in the 
differential detection of the secreted and receptor forms of this protein, since the proteins of SEQ ID 
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NOs:309 and 304 have homology to the secreted forms of LGI1. In addition, the proteins of SEQ ID 
NOs:309 and 304 or fragments thereof may be used to identify binding partners as described herein. 

DNA-binding proteins 

The invention relates to compositions and methods using proteins of the invention containing a 
5 DNA-binding domain, herein referred to as DBP, such as the ones described in this section and those 
containing a DNA binding domain domain as shown on Table VI, or parts thereof, preferably fragments 
comprising a DNA binding domain, or derivative thereof. 

Transcriptional regulation is primarily achieved by the sequence-specific binding of proteins to DNA 
and RNA. Of the known protein motifs involved in the sequence specific recognition of DNA, the zinc 
10 finger protein is unique in its modular nature. Zinc finger domains are found in numerous zinc binding 
proteins which are involved in protein-nucleic acid interactions. They are independently folded zinc- 
containing mini-domains which are used in a modular repeating fashion to achieve sequence-specific 
recognition of DNA (Klug 1993 Gene 135, 83-92). Such zinc binding proteins are commonly involved in the 
regulation of gene expression, and usually serve as transcription factors (see US patents 5,866,325; 6,013,453 
15 and 5,861,495). 

To date, zinc finger proteins have been identified which contain between 2 and 37 modules. More 
than two hundred proteins, many of them transcription factors, have been shown to possess zinc fingers 
domains. Zinc fingers connect transcription factors to their target genes mainly by binding to specific 
sequences of DNA. Zinc finger modules are found in a wide variety of transcription regulatory proteins in 

20 eukaryotic organisms. A zinc finger domain is generally composed of 25 to 30 amino acid residues which 
form one or more tetrahedral ion binding sites. The binding sites contain four ligands consisting of the 
sidechains of cysteine, histidine and occasionally aspartate or glutamate. The binding of zinc allows the 
relatively short stretches of polypeptide to fold into defined structural units which are well-suited to 
participate in macromolecular interactions (Berg, J. M. et al. (1996) Science 271:1081-1085). The zinc finger 

25 domain was first recognized in the transcription factor TFfffiA from Xenopus oocytes (Miller, et al., EMBO, 
4:1609-1614, 1985; Brown, et al., FEBS Lett., 186:271-274, (1985)). 

Zinc binding domains which contain a C 3 HC 4 sequence motif are known as RING domains 
(Lovering, R. et al. (1993) Proc. Natl. Acad. Sci. USA 90:2112-2116). The RING domain consists of 
eight metal binding residues, and the sequences that bind the two metal ions overlap (Barlow, P. N. et al. 

30 (1994) J. Mol. Biol. 237:201-21 1). Functions of RING finger proteins are mediated through DNA binding 
and include the regulation of gene expression, DNA recombination, and DNA repair (see Borden and 
Freemont, Curr Opin Struct Biol 6:395-401 (1996) and US patent 5,861,495). 
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Both the RING finger and the LM domain mediate protein-protein interactions and are involved in 
transcriptional control, either by directly affecting transcription or recruiting co-activators or co-repressors. 
LM domains also contribute to various signalling pathways. They may interact with protein kinases and 
anchor gene products to large protein complexes or to cellular compartments. 
5 PHD fingers are C4HC3 zinc fingers spanning approximately 50-80 residues and distinct from RING 

fingers or LM domains. They are thought to be mostly DNA or RNA binding domain but may also be 
involved in protein-protein interactions (for a review see Aasland et al, Trends Biochem Sci 20:56-59 
(1995)). The PHD finger domain, belonging to zinc finger domain family, is found in many regulatory 
proteins which are frequently associated with chromatin-mediated transcriptional regulation. 

10 The nucleic acid binding activity of DBP or part thereof may be assayed using any of the assays 

known to those skilled in the art including those described in US patent 6,013,453. 

The invention relates to compositions and methods using DBPs or part thereof, especially fragments 
comprising a DNA-binding domain, to stimulate gene transcription. 

One of the remarkable features of activation domains of transcriptional factors in general is that 

15 "fusing" them to heterologous protein domains seldom affects their ability to activate transcription when 
recruited to a wide variety of promoters. The high degree of functional independence exhibited by these 
activation domains makes them valuable tools in various biological assays for analyzing gene expression and 
protein-protein or protein-RNA or protein-small molecule drug interactions. Several strategies to improve 
the potency of activation domains and thereby the expression of genes under their control have been reported. 

20 These approaches generally involve increasing the number of copies of activation domains fused to the DNA 
binding domain or generating activators containing synergizing combinations of activation domains. 

Therefore, in an additional embodiment, this invention provides compositions and methods 
containing new transcription factors comprising DBP or part thereof, preferably fragments containing DNA- 
binding domains. Such transcription factors may be designed to regulate the expression of target genes of 

25 interest. Aspects of the invention are applicable to systems involving either covalent or non-covalent linking 
of the transcription activation domain to a DNA binding domain. In practice, cells can be engineered by the 
introduction of recombinant nucleic acids encoding the fusion proteins containing at least two mutually 
heterologous domains, one of them being the DNA-binding domain of the invention, and in some cases 
additional nucleic acid constructs, to render them capable of ligand-dependent regulation of transcription of a 

30 target gene. Administration of the ligand to the cells then regulates (positively, or in some cases, negatively) 
target gene transcription (all laboratory methods related to this embodiment are completely described in US 
patents 6.015.709, which disclosure is hereby incorporated by reference in its entirety). Illustrative (non- 
limiting) example of heterologous domains which can be included along with a DNA-binding domain in 
various fusion proteins of this invention include another transcription regulatory domains (i.e., transcription 
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activation domains such as a p65, VP 16 or AP domain; transcription potentiating or synergizing domains; or 
transcription repression domains such as an ssn-6/TUP-l domain or Kruppel family suppressor domain); a 
DNA binding domain such as a GAL4, lex A or a composite DNA binding domain such as a composite zinc 
finger domain or a ZFHD1 domain; or a ligand-binding domain comprising or derived from (a) an 
5 immunophilin, cyclophilin or FRB domain; (b) an antibiotic binding domain such as tetR: or (c) a hormone 
receptor such as a progesterone receptor or ecdysone receptor. A wide variety of ligand binding domains may 
be used in this invention, although ligand binding domains which bind to a cell permeant ligand are preferred. 
It is also preferred that the ligand have a molecular weight under about 5 kD, more preferably below 2.5 kD 
and optimally below about 1500 D. Non-proteinaceous ligands are also preferred. Examples of ligand binding 

10 domain/ligand pairs that may be used in the practice of this invention include, but are not limited to: 
FKBP:FK1012, FKBP: synthetic divalent FKBP ligands (see WO 96/0609 and WO 97/31898), 
FRB:rapamycin/FKBP (see e.g., WO 96/41865 and Rivera et al, "A humanized system for pharmacologic 
control of gene expression", Nature Medicine 2(9):1028-1032 (1997)), cyclophilinxyclosporin (see e.g. WO 
94/18317), DHFR:methotrexate (see e.g. Licitra et al, 1996, Proc. Natl. Acad. Sci. U.S.A. 93:12817-12821), 

15 TetR:tetracycline or doxycycline or other analogs or mimics thereof (Gossen and Bujard, 1992, Proc. Natl. 
Acad. Sci. U.S.A. 89:5547; Gossen et al, 1995, Science 268:1766-1769; Kistner et al, 1996, Proc. Natl. Acad. 
Sci. U.S.A. 93:10933-10938), a progesterone receptor:RU486 (Wang et al, 1994, Proc. Natl. Acad. Sci. 
U.S.A. 91:8180-8184), ecodysone receptonecdysone or muristerone A or other analogs or mimics thereof 
(No et al, 1996, Proc. Natl. Acad. Sci. U.S.A. 93:3346-3351) and DNA gyrasexoumermycin (see e.g. Farrar 

20 et al, 1996, Nature 383:178-181). In many applications it is preferable to use a DNA binding domain which is 
heterologous to the cells to be engineered. In the case of composite DNA binding domains, component 
peptide portions which are endogenous to the cells or organism to be engineered are generally preferred. 

In another aspect of this embodiment, polynucleotides encoding DNA-binding domains as well as 
any other functional fragments of DBP may be introduced into polynucleotides encoding fusion proteins for a 

25 variety of regulated gene expression systems, including both allostery-based systems such as those regulated 
by tetracycline, RU486 or ecdysone, or analogs or mimics thereof, and dimerization-based systems such as 
those regulated by divalent compounds like FK1012, FKCsA, rapamycin, API 5 10 or coumermycin, or 
analogs or mimics thereof, all as described below (See also, Clackson, Controlling mammalian gene 
expression with small molecules, Current Opinion in Chem. Biol. 1:210-218 (1997)). The fusion proteins 

30 may comprise any combination of relevant components, including bundling domains, DNA binding domains, 
transcription activation (or repression) domains and ligand binding domains. Other heterologous domains 
may also be included. 

Another embodiment of this invention relates to expression systems, preferably vectors and vector- 
containing cells, using DBP or part thereof, especially the DNA-binding domain. In this regard, recombinant 
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nucleic acids are provided which encode fusion proteins containing the transcription activation domain of the 
invention and at least one additional domain that is heterologous thereto, where the peptide sequence of said 
activation domain is itself eventually modified relative to the naturally occurring sequence from which it was 
derived to increase or decrease its potency as a transcriptional activator relative to the counterpart comprising 
5 the native peptide sequence. Each of the recombinant nucleic acids of this invention may further comprise an 
expression control sequence operably linked to the coding sequence and may be provided within a DNA 
vector, e.g., for use in transducing prokaryotic or eukaryotic cells. Some of the recombinant nucleic acids of a 
given composition as described above, including any optional recombinant nucleic acids, may be present 
within a single vector or may be apportioned between two or more vectors. The recombinant nucleic acids 

10 may be provided as inserts within one or more recombinant viruses which may be used, for example, to 
transduce cells in vitro or cells present within an organism, including a human or non-human mammalian 
subject. It should be appreciated that non-viral approaches (naked DNA, liposomes or other lipid 
compositions, etc.) may be used to deliver recombinant nucleic acids of this invention to cells in a recipient 
organism. The resultant engineered cells and their progeny containing one or more of these recombinant 

15 nucleic acids or nucleic acid compositions of this invention may be used in a variety of important 
applications, including human gene therapy, analogous veterinary applications, the creation of cellular or 
animal models (including transgenic applications) and assay applications. Such cells are useful, for example, 
in methods involving the addition of a ligand, preferably a cell permeant ligand, to the cells (or administration 
of the ligand to an organism containing the cells) to regulate expression of a target gene. 

20 The invention also relates to methods and compositions using DBP or part thereof to bind to 

nucleic acids, preferably DNA, alone or in combination with other substances. For example, DBP or part 
thereof is added to a sample containing nucleic acid in conditions allowing binding, and allowed to bind 
to nucleic acids. In a preferred embodiment, DBP or part thereof may be used to purify nucleic acids 
such as restriction fragments. In another preferred embodiment, DBP or part thereof may be used to 

25 visualize nucleic acids when the polypeptide is linked to an appropriate fusion partner, or is detected by 
probing with an antibody. Alternatively, DBP or part thereof may be bound to a chromatographic 
support, either alone or in combination with other DNA binding proteins, using techniques well known in 
the art, to form an affinity chromatography column. A sample containing nucleic acids to purify is run 
through the column. Immobilizing DBP or part thereof on a support advantageous is particularly for 

30 those embodiments in which the method is to be practiced on a commercial scale. This immobilization 
facilitates the removal of the protein from the batch of product and subsequent reuse of the protein. 
Immobilization of DBP or part thereof can be accomplished, for example, by inserting a cellulose-binding 
domain in the protein. One of skill in the art will understand that other methods of immobilization could 
also be used and are described in the available literature. 
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In another embodiment, the present invention relates to compositions and methods using DBP or 
part thereof, especially the DNA-binding domain, to alter the expression of genes of interest in a target 
cells. Such genes of interest may be disease related genes, such as oncogenes or exogenous genes from 
pathogens, such as bacteria or viruses using any techniques known to those skilled in the art including 
5 those described in US patents 5,861,495; 5,866,325 and 6,013,453. 

In still another embodiment, DBP or part thereof may be used to diagnose, treat and/or prevent 
disorders linked to dysregulation of gene transcription such as cancer and other disorders relating to 
abnormal cellular differentiation, proliferation, or degeneration, including hyperaldosteronism, 
hypocortisolism (Addison's disease), hyperthyroidism (Grave's disease), hypothyroidism, colorectal 
10 polyps, gastritis, gastric and duodenal ulcers, ulcerative colitis, and Crohn's disease. 

Protein of SEQ ID NO: 388 (internal designation 109-0Q2-4-0-C6-CS) 

The protein of SEQ ID NO: 388 encoded by cDNA of SEQ ID NO: 147 is a 375 amino-acids long 
protein containing a zinc finger domain, namely a PHD-fmger domain from positions 329 to 339. 

The PHD finger was originally identified by comparison of the maize homeodomain (HD) protein 

15 ZMHOXla (Bellmann R. and Werr W. EMBO J. 11: 3367-3374 (1992)) to its Arabidopsis relative HAT3.1 
and named plant homeodomain (PHD) finger due to its association with the DNA-binding HD in both genes. 
This motif often occurs in various regulatory genes, such as members of the trithorax (TRX-G) or polycomb 
(PC-G) groups (Aasland R. et al. Trends Biochem.Sci. 20: 56-59 (1995)) and leukaemia-associated proteins 
(LAP finger) (Saha V. et al. Proc.Natl.Acad.Sci. USA 92: 9737-9741 (1995)). The established function of 

20 TRX-G and PC-G genes in chromatin modulation in Drosophila led to the suggestion that the PHD finger is 
involved in chromatin-mediated transcriptional control. Recent data provide evidence that PHD finger 
proteins are associated with chromatin remodelling complexes (Bochar D.A. et al. Proc.Natl.Acad.Sci. USA 
97: 1038-1043 (2000)) or contribute to histone acetylation (Loewith R. et al. Mol.Cell.Biol. 20: 3807-3816 
(2000)). Based on the position of the unique His residue, the cysteine scaffold of the PHD finger (Cys4-His- 

25 Cys3) is clearly distinct from RING fingers (Cys3-His-Cys4) and LIM domains (Cys2-His-Cys5) and from 
DRIL domains, where two RING finger motifs are closely linked. In contrast to the accumulating knowledge 
about LIM domains, functional data concerning the PHD finger remain rare (see rev. Halbach T. et al. 
Nucleic Acids Research 28: 3542-3550 (2000)). 

GYMNOS, a recently described member of the SWI2/SNF2 protein family in plants (22), also 

30 contains a PHD finger and takes part in the control of development. The second PHD finger motif of 
Drosophila dMI-2 protein (a reference for animal counterparts) shares high sequence conservation to known 
plant PHD fingers. Due to the similarity to the Drosophila MI-2 gene, GYMNOS has been implicated in 
chromatin modulation. While the PHD finger is an isolated motif in GYMNOS, the characteristic Cys4-His- 
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Cys3 scaffold in PHDf-HD plant genes is embedded in a large region. This region shares 60% identical 
residues between seven genes of different plant species and is more highly conserved than the HD (40%). 
This conservation suggests that the PHD finger is part of a larger functional unit. When combined with a 
leucine zipper in the surrounding conserved 180 amino acid region in the PHDf-HD proteins, PHD finger 
5 activity is masked and silenced. The leucine zipper upstream of the PHD finger mediates interactions with 
helix 4 of plant 14-3-3 proteins, thus identifying PHDf-HD proteins as potential targets of 14-3-3 signalling 
pathways. The 14-3-3 family of multifunctional proteins is highly conserved between animals, plants and 
yeast. Due to the dimeric nature of 14-3-3 proteins and their capacity to form homo- and heterodimers, 
members of the 14-3-3 protein family function as scaffolds promoting association of protein complexes. 14- 

10 3-3 proteins are involved in various signalling pathways that include, for example, Raf, BAD, Bcr/Bcr-Abl, 
KSR (kinase supressor of Ras), PKC, PI-3 kinase and cdc25C phosphatase. Others enter the nucleus and are 
associated with DNA-binding complexes. Recent data even indicate contacts to TBP, TFIDB and the human 
TBP-associated factor hTAF(II)32 (for rev.see Halsbach T., supra ). 

Recently PHD finger has been shown to activates transcription in yeast, plant and animal cells. 

15 Transcriptional activation in animal cells (in the zebrafish embryo as a test system) tested for different PHD 
fingers seems to be a general feature of the PHD finger motif in eukaryotic cells. 

It remains to be elucidated whether the PHD finger directly interacts with a component of the 
transcription initiation complex or if its positive effect on transcription is mediated via auxiliary protein 
interactions. Both assumptions, however, involve PHD finger-mediated protein-protein interactions. 

20 Surrounding sequences may interfere sterically with accession of the PHD finger and its exposure could 
eventually depend on binding of a protein partner. 

The PHD finger containing proteins appear to be involved in human diseases. Studies on the AIRE 
gene from humans (Nagamine K. et al. Nat.Genet. 17: 393-398 (1997), Scott H.S. et al. Mol.Endocrinol. 12: 
1112-1119 (1998)) have shed more light on the importance of this motif, since all clinically significant 

25 mutations in the AIRE gene coincide with alteration in two PHD fingers, resulting in the rare autoimmune 
polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED). The presence of PHD fingers in genes 
up-regulated in leukaemia, associated with the autoimmune disease APECED or participating in euchromatin 
to heterochromatin modulation, like the TRX-G or PC-G genes, indicates that this motif may be involved in a 
variety of important cellular events including developmental disorders, tumors and immune diseases. For 

30 exemple, the role of a chromatin structure remodelling in cancer metastasis and tissue carcinogenesis is well 
documented (Zhang Y. et al. Cell 16: 279-289 (1998); Klugbauer S. and Rabes H.M. Oncogene 29: 4388- 
4393 (1999)). 

It is believed that the protein of SEQ ID NO: 388 or part thereof is a zinc binding protein, preferably 
able to bind nucleic acids, more preferably a transcription factor. Preferred polypeptides of the invention are 
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polypeptides comprising the amino acids of SEQ ED NO: 388 from positions 329 to 339. Other preferred 
polypeptides of the invention are fragments of SEQ ID NO: 388 having any of the biological activity 
described herein. 

In one embodiment of the invention, the protein of the invention, or part thereof, or derivative 
5 thereof, may be used to a subject to diagnose developmental disorders and/or cell proliferative disorders 
linked to dysregulation of gene expression mediated by the PHD-finger domain of the protein of the 
invention. Such disorders include but are not limited to, renal tubular acidosis, anemia, Cushing's syndrome, 
achondroplasia dwarfism, epilepsy, gonadal dysgenesis, hereditary neuropathies such as Charcot-Marie- 
Tooth disease and neurofibromatosis, hypothyroidism, hydrocephalus, seizure disorders such as Syndenham's 

10 chorea and cerebral palsy, spinal bifida, and congenital glaucoma, cataract, sensorineural hearing loss, benign 
tumors, and cancers such as adenocarcinoma; leukemia; melanoma; lymphoma; sarcoma; and cancers of the 
bladder, colon, liver, brain, small intestine, large intestine, breast, ovary, kidney, lung, and prostate. 
Diagnosis may be performed using nucleic acids or antibodies able to detect the expression of the protein of 
the invention using any technique known to those skilled in the art including Northern blotting, RT-PCR, 

15 immunoblotting methods immunohistochemisty, enzyme-linked immunosorbant assay (ELISA) described 
herein. Quantities of the protein of the invention expressed in subject samples, control and disease from 
biopsied tissues or body fluids or cell extracts taken from patients are compared with the standard values. 
Deviation between standard and subject values establishes the parameters for diagnosing disease. 

In another embodiment, antagonists or inhibitors of the protein of the invention or part thereof may 

20 be administered to patients to treat and/or prevent the above referred disorders. Antagonists or inhibitors of 
transcriptional activators may indeed be used to suppress transcriptional activation in tumor cells. Such 
antagonists and/or inhibitors may be antibodies specific for the protein of the invention that can be used 
directly as an antagonist, or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical 
agent to cells or tissue which express the protein of the invention. Neutralizing antibodies, (i.e., those which 

25 inhibit protein-protein interactions) are especially preferred for therapeutic use. Other methods to inhibit the 
expression of the protein of the invention include antisense and triple helix stategies as described herein. 
Other antagonists or inhibitors of the protein of the invention may be produced using methods which are 
generally known in the art, including the screening of libraries of pharmaceutical agents to identify those 
which specifically bind the protein of the invention. The protein of the invention, or part thereof, preferably 

30 its functional or immunogenic fragments, or oligopeptides related thereto, can be used for screening libraries 
of compounds in any of a variety of drug screening techniques. The fragment employed in such screening 
may be free in solution, affixed to a solid support, borne on a cell surface, or located intracellularly. The 
formation of binding complexes, between the protein of the invention, or part thereof, or derivative thereof, 
and the agent being tested, may be measured. Another technique for drug screening which may be used 
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provides for high throughput screening of compounds having suitable binding affinity to the protein of the 
invention as described in published PCT application WO84/03564. 

Protein ofSEQ ID NO: 394 (internal designation 157-1 7-2-0-C1-CS) 

The protein of SEQ ID NO: 394 encoded by the extended cDNA SEQ ID NO: 153 contains a 
5 myc-type, helix-loop-helix dimerization domain (Prosite PS00038) from amino acid position 13 to 28 and 
has no adjacent basic domain. Using the Schiffer-Edmundson helical wheel diagram (Schiffer et al. 
(1967) Biophys.J. 7:121-135), a hypothetical amphipatic alpha helix is predicted between position 53 and 
position 68. Three hydrophobic amino acids, Val 55, Phe59 and Ile63, are aligned on the same side of 
the helix to present a hydrophobic interaction surface and three hydrophilic residues (Tyr53, Gln62 and 
10 Ser64) are presented on the other side of helix. There is no Proline residue within the stretch to disrupt 
the continuity of the alpha helix. Thus, these structural features in the protein of the invention indicates 
that this protein could be a novel member of the nonbasic "helix-loop-helix" subfamily (HLH) of 
transcription regulator. 

The helix-loop-helix (HLH) family of transcriptional regulators is involved in the control of 

15 different cellular differentiation phenomenon such as neurogenesis, haematopoiesis, myogenesis and 
angiogenesis. The HLH proteins are found in all eukaryotic organisms ranging from yeast 
saccharomyces cerevisiae to human (Reviewed by Massari ME and Murre C. (2000) Molecular and 
Cellular Biology, 20 (2):429-440). The HLH proteins bind DNA as dimers, and different members of 
HLH family bind either as homodimers or as heterodimers with other members of the family. The 

20 presence in a cell of a large repertoire of distinct complexes that can bind to a particular DNA sequence 
element suggests that competition for DNA binding may play a regulatory role. 

Members of the helix-loop-helix (HLH) family of transcriptional regulation proteins share a 
common structural element, i.e. a stretch of 40-50 amino acids containing two short amphipathic alpha- 
helices separated by a linker region (the loop) of varying length (Murre C et al. (1989) Cell 56:777-783). 

25 This element was initially identified as a region of homology among c-myc, the muscle determination 
gene MyoD (Davis RL et al. (1987) Cell 51:987-1000) and the Drosophila achaete-scute complex (AS-C) 
involved in neural determination (Villares R. and Cabrera CV (1987) Cell 50:415-424). The HLH 
proteins form both homodimers and heterodimers by means of interaction between the hydrophobic 
residues on the corresponding faces of the two helices to give a parallel four-helix bundle structure 

30 (Adrian R et al. (1993) Nature, 363:38-45; Ellenberger T et al. (1994) Genes Dev. 8:970-980). The alpha 
helical regions are usually 15-16 amino acids long with hydrophobic residues at every third and fourth 
position, and each helix contains several conserved residues (Murre C et al. (1989) Cell, 56:777-783; 
Benezra R. et al. (1990) Cell, 61:49-59). 
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The HLH protein family is subdivided into two major groups: the so-called "bHLH" and "non 
basic HLH" subfamilies. Proteins of the bHLH family contain a conserved highly basic region 
immediately N-terminal to the first helix (known as bHLH structure), and mutagenesis experiments on 
MyoD protein confirm that this region is responsible for sequence-specific binding to the "E-box", a 
5 consensus DNA motif for bHLH proteins (Davis RL. et al. (1990) Cell, 60: 733-746 ). A dimeric bHLH 
protein (either homodimeric or heterodimeric but in which both subunits contains a basic region) are able 
to bind to DNA. In general, the bHLH proteins fall into two categories: Class A consists of proteins that 
are ubiquitously expressed, including mammalian E12/E47 and fly da whereas the class B consists of 
proteins that are expressed in a more tissue-specific manner, including mammalian MyoD and fly AC-S. 

10 In most cases, the tissue-specific bHLH proteins preferentially heterodimerize with ubiquitous partners. 

The non basic HLH subfamily contains proteins lacking a basic region unable to bind to DNA but 
that could form homo- or heterodimers through their HLH motif. Indeed, heterodimeric complexes 
between non basic HLH and bHLH proteins fail to bind to DNA and negatively modulate the bHLH 
proteins-mediated transcription activation. This phenomenon was first demonstrated in a MyoD/Id 

15 regulation model (Benezra R. et al. (1990) Cell, 61:49-59 ). The MyoD gene product is able to activate 
previously silent muscle-specific genes when introduced into a large variety of differentiated cell types. 
MyoD proteins form either homodimers or heterodimers with other bHLH proteins such as E12 or E47, 
and bind to E-box consensus motif to activate myogenesis. The Id gene, conserved from batracians to 
mammals (Wilson R et al. (1995) Mech.Dev. 49:211-222; Sawai S et al. (1997) Mech.Dev. 65:175-185; 

20 Norton JD et al. (1998) trends in Cell Biology 8:58-65), lacks a basic region adjacent to its HLH motif but 
is able to specifically dimerize with either MyoD,E12 or E14 and has been shown to subsequently 
attenuate the heterodimer's ability to bind DNA. Additionally, overexpression of Id inhibits MyoD- 
dependent gene activation in in vivo transfection experiments. Id proteins may function either to repress 
directly the activity of tissue-restricted bHLH proteins by rendering them non-functional or, more likely, 

25 to sequestrate the ubiquitous bHLH proteins and preventing them from forming active heterodimers with 
the tissue-restricted bHLH (Review by Norton JD et al. (1998) trends in Cell biology 8:58-65). 

The possibility that the Id protein behaves as a dominant-negative regulator to repress MyoD 
protein activity through the formation of nonfunctional heterodimeric complexes is considerably 
strengthened by the following findings in Drosophila. In Drosophila, the development of peripheral 

30 nervous system is positively regulated by the two structurally related bHLH proteins, AS-C and 
daughterless (da), since loss of either activity results in loss of sensory organ development. The 
extramacrochaetae Emc product belonging to the non basic HLH subfamily was shown to antagonize the 
activity of AS-C and da. through the formation of nonfunctional heterodimers with the bHLH proteins 
(Hillary M et al. (1990) Cell,61:27-38; Garrell J et al. (1990) Cell 61,39-48). 
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Human Id genes including human Idl 5 Id2 ? Id3 and Id4 have been identified and localized 
(Review by Norton JD et al. (1998) trends in Cell Biology 8:58-65). The bHLH proteins and Id proteins 
are thought to be involved in the regulation of apoptosis. Differentiation and development of T- and B- 
lymphocytes in immune system are positively regulated by the combination of ubiquitous E proteins and 
5 lymphocyte-restricted bHLH proteins. Disruption in gene expression from either class results in severe 
perturbation of T- and B-lymphocyte development (Bain G et al. (1997) Mol Cell Biol 17:4782-4791; 
Zhuang et al. (1996) Mol Cell Biol 16:2898-2905). Cell-arrested T thymocytes undergo a massive 
apoptosis when Idl gene is overexpressed (Kim D (1999) Mol Cell Biol 19(1 2): 8240-53). 
Overexpression of Idl gene product also results in apoptosis in neonatal and adult cardiac myocytes in 

10 culture (Tanaka K et al. (1998)J Biol Chem 273(40) 25922-25928). 

Idl and Id3 proteins are also required to support angiogenesis. Quiescent adult endothelial cells 
express minimal level of the Id proteins, whereas Id expression is upregulated in angiogenic endothelial 
cells. Partial loss of these proteins in Idl +/ Td3" /_ double knockout mice impairs angiogenesis, resulting in 
the resistance to tumour growth (Lyden D et al. (1999) Nature 401:670-677). In addition, a significant 

15 overexpression of mRNA and protein levels of Idl, Id2 and Id3 has been found in patients with pancreatic 
cancer (Maruyama H et al. Am J Pathol (1999) 155(3):815-822 ) A correlation of Idl gene upregulation 
and aggressive phenotype of human breast cancer cells has also been reported (Lin CQ et al. (2000) 
Cancer Res 60(5): 1332-40). 

Thus, identification and cloning of members of the HLH family, and especially of the non basic 

20 HLH subfamily, is necessary to enrich our knowledge about the biological importance of the HLH 
transcription factors network and further more to provide insights and tools in disorders linked to 
dysregulation of the HLH-mediated transcription. 

It is believed that the protein of SEQ ID NO: 394 or part thereof plays a role in the regulation of 
transcription activation, probably as a member of the HLH family, preferably of the non basic HLH 

25 subfamily. More particularly, the protein of the invention is thought to be able to antagonize the activity 
of members of the bHLH family through the formation of heterodimers. Preferred polypeptides of the 
invention are polypeptides comprising the amino acids of SEQ ID NO: 394 from positions 13 to 28, from 
positions 53 to 68, and from positions 13 to 68. Other preferred polypeptides of the invention are 
fragments of SEQ ID NO: 394 having any of the biological activity described herein. 

30 The dimerization ability of the protein of the invention or part thereof which is characteristic of 

the HLH family may be assayed using any of the assays known to those skilled in the art. For example, 
interacting protein partners, especially members of the bHLH subfamily, may be identified using 
screening of cDNA expression libraries as described for the identification of some HLH transcription 
factors such as E12 and E47 (Murre C et al. (1989) Cell 56:777-783), Max (a Myc binding factor) 
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(Elizabeth M et al. (1991) Science 251:1217) as well as Id (Benezra C et al. (1990) Cell 61:49-59). 
Alternatively, the helix-loop-helix motif in the protein of the invention could be used by those skilled in 
art as a "bait protein" in a well established yeast double hybridization system to identify its interacting 
protein partners in vivo from cDNA library derived from different tissues or cell types of a given 
5 organism. Alternatively, the protein of the invention or part thereof could be used by those skilled in art 
in mammalian cell transfection experiments. When fused to a suitable peptide tag such as [His] 6 tag in a 
protein expression vector and introduced into culture cells, this expressed fusion protein can be 
immunoprecipitated with its potential interacting proteins by using anti-tag peptide antibody. This 
method could be chosen either to identify the associated partner or to confirm the results obtained by 
10 other methods such as those just mentioned. 

An object of the invention relates to compositions and methods using the protein of the invention 
or part thereof to dysregulate gene transcription, preferably transcription mediated by HLH regulators 
either in vitro or in vivo, through overexpression of the protein of the invention using any means known 
to those skilled in the art. 

1 5 The protein of the invention or part thereof could be used to induce apoptosis of specific cell-type 

under either physiological or pathological conditions. In a preferred embodiment, the apoptosis active 
polypeptide is added to an in vitro culture of mammalian cells in an amount effective to induce apoptosis. 
In another preferred embodiment, the apoptosis active polypeptide is expressed under the control of a 
promoter which may be activated under precise conditions. In particular, such conditional expression of 

20 an apoptosis-active polypeptide upon demand may be very useful to get rid of cells that have become 
unwanted, for example in applications where such cells have been used in a cellular therapy goal and have 
become useless. Another example of application is the case of expression under the control of a promoter 
that becomes active after infection by a given microorganism, thus resulting in the death of the infected 
cells only. Furthermore, the protein of the invention or part thereof may be useful in the diagnosis, the 

25 treatment and/or the prevention of disorders in which apoptosis is beneficial, including but not limited to 
disorders linked to abnormal cellular proliferation such as those described below. 

In another embodiment, the protein of the invention or part thereof can be used to diagnose, treat 
and/or prevent disorders linked to overexpression of HLH proteins, such as cancer and other disorders 
relating to abnormal cellular differentiation, proliferation, or degeneration, including hyperaldosteronism, 

30 hypocortisolism (Addison's disease), hyperthyroidism (Grave's disease), hypothyroidism, colorectal 
polyps, gastritis, gastric and duodenal ulcers, ulcerative colitis, and Crohn's disease, neurodegenerative 
disorders such as Parkinson's or Alzheimer's diseases using any methods and/or techniques described 
herein. In addition, the protein of the invention or part thereof may be used to evaluate the disease 
progression and the clinical treatment efficiency. The protein of the invention or part thereof could also 
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be used a molecular target for anti-angiogenesis drug design. Inhibition of protein expression could be 
achieved by many means known to those skilled in the art including those described in the present 
application. For example, an antisense nucleotide or triple helix strategy could be developed to block the 
protein synthesis. Alternatively, the expressed protein of the invention might be neutralized by using 
5 specific monoclonal antibody using techniques known to those skilled in the art including those described 
in Peverali FA et al (1994) EMBO J. 13:4291-4301; Barone MV et al. (1994) Proc.Natl.Acad.Sci.USA 
91:4985-4988; and Haza ET et al (1994) J.Biol.Chem. 269:2139-2145. 

Protein ofSEQIDNO: 466 (internal designation 184-4-2-0-D3-CS) 

The protein of SEQ ID NO: 466 overexpressed in liver and encoded by the cDNA of SEQ ID 
10 NO: 225 displays a Zinc finger motif of RING type (C3HC4) (Pfam signature from positions 41 to 81, 
Prosite signature from positions 56 to 65) and a B-box zinc finger motif (pfam signature from positions 
1 10 to 153). In addition, the protein of the invention is predicted to have a nuclear localization. 

It is believed that the protein of SEQ ID NO: 466 or part thereof is a zinc binding protein, 
preferably able to bind nucleic acids, more preferably a transcription factor. Preferred polypeptides of the 
15 invention are polypeptides comprising the amino acids of SEQ ID NO: 466 from positions 41 to 81 (Ring 
Zinc finger protein), and from 1 10 to 153 (B-Box domain). Other preferred polypeptides of the invention 
are fragments of SEQ ID NO: 466 having any of the biological activity described herein. 

Protein of SEQ ID NO: 267 (internal designation 1 16-1 1 1-1 -0-H9-CS) 

The protein of SEQ ID NO: 267 encoded by the extended cDNA SEQ ID NO: 26 exhibits an 
20 Emotif zinc finger domain, C2H2 type, from positions 185 to 202, and is thought to be localized in the 
nucleus. 

It is believed that the protein of SEQ ID NO: 267 or part thereof is a zinc binding protein, 
preferably able to bind nucleic acids, more preferably a transcription factor. Preferred polypeptides of the 
invention are polypeptides comprising the amino acids of SEQ ID NO: 267 from positions 185 to 202. 
25 Other preferred polypeptides of the invention are fragments of SEQ ID NO: 267 having any of the 
biological activity described herein. 

Protein of SEQ ID NO: 277 (internal designation 160-103-1-0-Fll-CS) 

The protein of SEQ ID NO: 277 encoded by the extended cDNA SEQ ID NO: 36 exhibits a pfam 
DHHC zinc finger domain from positions 140 to 204. 
30 It is believed that the protein of SEQ ID NO: 277 or part thereof is a zinc binding protein, 

preferably able to bind nucleic acids, more preferably a transcription factor. Preferred polypeptides of the 
invention are polypeptides comprising the residues of SEQ ID NO: 277 from positions 140 to 204. Other 
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preferred polypeptides of the invention are fragments of SEQ ID NO: 277 having any of the biological 
activity described herein. 

Protein of SEQ ID NO: 272 (internal designation 145-25-3-0-B4-CS) 

The protein of SEQ ID NO: 272 encoded by the extended cDNA SEQ ID NO: 31 shows 
5 homology with numerous zinc binding proteins. In addition, the protein of the invention exhibits the 
pfam RING zinc finger signature from positions 87 to 129. The protein of SEQ ID NO: 272 has a variant, 
i.e. the protein of SEQ ID NO: 273 encoded by the extended cDNA SEQ ID NO: 32 and thought to have 
the same function and utilities. 

It is believed that the protein of SEQ ID NO: 272 or part thereof is a zinc binding protein, 
10 preferably able to bind nucleic acids or proteins, more preferably a transcription factor. Preferred 
polypeptides of the invention are polypeptides comprising the amino acids of SEQ ID NO: 272 from 
positions 87 to 129. Other preferred polypeptides of the invention are fragments of SEQ ID NO: 272 
having any of the biological activity described herein. 

Hydrolases and inhibitors 

15 The invention relates to compositions and methods using proteins of the invention havinf an 

hydrolytic activity, herein referred to as HYP, such as the ones described in this section and those 
containing an hydrolytic domain as shown on Table VI, or parts thereof, preferably fragments comprising 
an hydrolytic domain, or derivative thereof. 

The invention relates to methods and compositions using HYP or a fragment thereof to hydrolyze 

20 one or several substrates, alone or in combination with other substances.. For example, the protein of the 
invention or part thereof is added to a sample containing the substrate(s) in conditions allowing 
hydrolysis, and allowed to catalyze the hydrolysis of the substrate(s). Hydrolyzed substrates are then 
detected using standard methods known to those skilled in the arts. The protein of the invention or part 
thereof can also be added to samples as a "cocktail" with other hydrolytic enzymes, such as other 

25 peptidases, for example to decontaminate surgical instruments using methods described in US patent 
5,489,531. The advantage of using a cocktail of hydrolytic enzymes is that one is able to hydrolyze a 
wide range of substrates without necessarily knowing the specificity of each enzyme. Using a cocktail of 
hydrolytic enzymes also protects a sample from a wide range of future unknown contaminants from a vast 
number of sources. Alternatively, HYP or part thereof may be bound to a chromatographic support, 

30 either alone or in combination with other hydrolytic enzymes, using techniques well known to those 
skilled in the art, to form an affinity column to remove the substrate. Immobilization facilitates removal 
of the enzyme from the batch of product and subsequent reuse of the enzyme. 
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Immobilization of the enzyme or part thereof can accomplished, for example, by adding a 
cellulose-binding domain to the protein through the modification of the DNA sequence coding for the 
protein or part thereof One of skill in the art will understand that other methods of immobilization could 
also be used and are described in the available literature. Alternatively, the same methods may be used to 
5 identify new substrates. 

In another embodiment, HYP or part thereof may be used to identify or quantify the amount of a 
given substrate in a biological sample. In a preferred embodiment, HYP of part thereof is catalytically 
inactived , i.e. capable of binding but not hydrolyzing a given substrate, using any of the methods known 
to those skilled in the art including those which produce a mutant enzyme, a recombinant-enzyme, or a 

10 chemically inactivated enzyme. The catalytically inactive protein of the invention is then incubated with 
an aliquot of a biological sample under conditions suitable for binding of the inactive enzyme to the 
substrate. Then, the bound enzyme is detected to assess the presence or amount of the eubacteria in the 
biological sample. In another preferred embodiment, HYP or part thereof is used in assays and diagnostic 
kits for the identification and quantification of substrates in a biological sample. These assays can be 

15 based for example, on standard enzyme-linked immunosorbant assays (ELISA) or any other technique 
known to those skill in the artln addition, HYP or part thereof may be used to identify, e.g. using screens 
based on standard assays such as those described above, inhibitors of the enzyme for mechanistic and 
clinical applications. Such inhibitors may then be used to identify or quantify HYP in a sample, and to 
diagnose, treat or prevent any of the disorders where the protein's activity is undesirable and/or 

20 deleterious. 

Protein ofSEQ ID NO: 400 (internal designation 1 60-54- 1-0-F7-CS) 

The protein of SEQ ID NO:400, encoded by the cDNA of SEQ ID NO: 159, exhibits two putative 
transmembrane domains encompassing amino-acids 50-70 and 127-147 as predicted by the software 
TopPred II (Claros and von Heijne, CABIOS applic. Notes, 10 :685-686 (1994)). It also diplays the 

25 Prosite carboxypeptidase zinc-binding region signature PS00133 at positions 1 17-127. It is predicted by 
the psort software (see Nakai K and Horton P, Trends Biochem Sci. 1999 Jan;24(l):34-6) to localize to 
the nucleus with a high probability (73.9%). Finally it is specifically expressed in fetal brain and shows 
no homology to previously known proteins. 

Carboxypeptidase enzymes hydrolyze the terminal amino acid of a protein or peptide. A novel 

30 family of carboxypeptidases, localized in the nucleus and with a carboxypeptidase-dependant 
transcriptional activity, has emerged only recently. Its first member, AEBP1, was previously identified as 
a 3T3 preadipocyte factor implicated in the repression of the aP2 gene expression. AEBP1 stands for 
"AE-1 Binding Protein," where AE-1 is a regulatory element of the adipose P2 gene (aP2), a gene 
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involved in triglyceride metabolism and activated in adipocytes. Its own expression is abolished during 
adipocyte differentiation (He GP et al., Nature 378:92-96(1995)). AEBP1 was subsequently shown to 
play a similar role in the differentiation of osteoblastic cell lines (Ohno l et al., Biochem Biophys Res 
Commun. 1996 Nov 12;228(2):41 1-4) and vascular smooth muscle cells (Layne MD et al., J. Biol. Chem. 
5 273:15654-15660(1998)). It was proposed that AEBP1 acts as a negative transcription factor by cleaving 
proteins involved in transcription, a new feature in transcription regulation. Recent evidence further 
suggests that its transcriptional activity is itself attenuated by binding to G-proteins subunits (Park JG et 
al., EMBO J. 1999 Jul 15;18(14):4004-12) and stimulated by DNA binding (Muise AM and Ro HS, 
Biochem J. 1999 Oct 15;343 Pt 2:341-5). 

10 It is believed that the protein of SEQ ID NO:400 plays a role in cell signaling, nuclear 

transcriptional activity and in the differentiation of several cell types, especially those found in the 
developing brain (including but not limited to neurons). Preferred polypeptides of the invention are 
polypeptides having any of the biological activities described herein. 

One embodiment of the present invention relates to compositions and methods using the protein 

15 of the invention or part thereof as a marker for specific cell compartments (especially the nucleus) and/or 
tissue types (especially fetal brain). For example, the protein of the invention or part thereof may be used 
to generate specific antibodies which would in turn allow the visualization of nuclear structures by 
methods well-known to those of skill in the art. In a similar fashion, antibodies raised against the protein 
of the invention may be used to identify particular developmental stages (fetal for instance) and/or given 

20 tissue types (brain for instance), as the protein of the invention is specifically expressed in brain tissues at 
a fetal stage. Antibodies and antiserum can also be used to inhibit undesirable carboxypeptidase activities 
in in vitro experiments and cell cultures, as well as in biological samples and in vivo. Alternatively, 
quantitative analysis or detection of the protein of the invention, or of nucleic acids encoding the protein, 
can be carried out by any other technique known to those skilled in the art. 

25 In another embodiment, the protein of the invention may be used to target heterologous 

compounds (polypeptides or polynucleotides) to the developing brain and/or the cell nucleus. For 
instance, a chimeric protein composed of the protein of the invention recombinantly or chemically fused 
to a protein or polynucleotide of therapeutic interest would allow the delivery of the therapeutic 
protein/polynucleotide specifically to the above-mentioned cellular/tissue targets (nucleus, fetal brain). 

30 In another embodiment, the present invention relates to methods and compositions using the 

protein of the invention or a fragment thereof to hydrolyze one or several substrates, alone or in 
combination with other substances. The ability of the present protein to hydrolyze any particular 
substrate can easily be determined by carrying out a hydrolysis reaction using standard assay techniques 
such as the ones decribed by Slusher et al. (Slusher et al. - Prostate - 2000, 44(1): 55-60) or any other 
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technique well known to those skilled in the art. Potential substrates are any substance containing a 
peptide bond, more specifically a C-terminal peptide bond. Such substances include, but are not limited 
to, polypeptides, folic acid and its analogues (e.g. methotrexate). For example, the protein of the 
invention or part thereof is added to a sample containing the substrate(s) in conditions allowing 
5 hydrolysis, and allowed to catalyze the hydrolysis of the substrate(s). Hydrolyzed substrates are then 
detected using standard methods known to those skilled in the art. 

In a preferred embodiment, the protein of the invention or part thereof may be used to modulate 
cellular transcriptional activity, thereby modulating cellular differentiation. Specifically, as nuclear 
carboxypeptidases play a role in inhibiting transcription associated with differentiation, then an increase 

10 in the activity or expression of the protein can be used to inhibit differentiation. The ability to inhibit 
differentiation has a number of uses, for example during the cultivation of undifferentiated pluripotent 
cells to maintain the cultured cells in an undifferentiated state until the need for a given cell type arises (in 
cases of grafts for instance). The level of the protein activity or expression can be increased in any of a 
number of ways, including by introducing a polynucleotide encoding the protein into cells, by 

15 administering the protein itself to cells, or by administering to cells a compound that increases protein 
activity or expression. Alternatively, the protein of the invention can be inhibited, thereby enhancing 
cellular differentiation. The ability to promote differentiation has many uses, including in the treatment 
or prevention of cancer, as cancer cells are often in a relatively undifferentiated state, and cellular 
differentiation typically accompanies by growth arrest. 

20 In another embodiment, the protein of the invention or part thereof may be used to diagnose, treat 

and/or prevent disorders where the presence of substrates, for example excess proteins or peptides, is 
undesirable or deleterious. Such disorders include but are not limited to, cancer, neurodegenerative 
disorders such as Parkinson's and Alzheimer's diseases, and diabetes. In another embodiment, the protein 
of the invention or part thereof may be used to identify or quantify the amount of a given substrate (e.g. a 

25 peptide, folic acid, or methotrexate) in a biological sample. In a preferred embodiment, the protein of the 
invention or part thereof is used in assays and diagnostic kits for the identification and quantification of 
substrates in a biological sample. 

In a most preferred embodiment, the protein of the invention or part thereof can be used in cancer 
chemotherapies in rescue therapy following toxic high dose methotrexate regimes. Many 

30 carboxypeptidases can cleave the C-terminal glutamate moiety from folic acid and its analogues, such as 
methotrexate. The key role of reduced folates as coenzymes in many biological pathways including those 
leading to DNA synthesis via the pyrimidines and purines, has made folic acid a target molecule for 
chemotherapy. Tumor cells grow rapidly and have a high rate of nucleic acid synthesis. Depletion of 
folic acid has cytotoxic effects, primarily in replicating tissues, and can inhibit growth of tumors with 
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high folic acid requirements. Many carboxypeptidases can directly deplete folate by hydrolytic removal 
of its glutamate moiety. In cancer chemotherapy, methotrexate (4-amino-N ,0 -methyl-pteroyl-glutamate) 
is commonly used to deplete the pool of reduced folates by inhibiting dihydrofolate reductase (DHFR), 
which catalyses the reduction of folates into biologically active tetrahydrofolate form, essential in the 
5 biosynthesis of all folate coenzymes. Thus, the protein of the invention or part thereof could be used in 
rescue therapy following toxic high-dose regimes such as described by Widemann et al. (Widemann B. et 
al. - Proc. Am. Assoc. Cancer Res. - 1995, 36, p232) and Chabner et al. (Chabner B. et al. - Nature - 
1972, 239, p395-397), which disclosures are hereby incorporated by reference in their entity. The basis of 
this strategy is that hydrolysis of methotrexate produces 4-amino-N 10 -methyl-pteroate that is about 100 

10 times less active as an inhibitor of DHFR. 

In another preferred embodiment, the protein of the invention or part thereof can be used in an 
enzyme/prodrug strategy to treat a number of pathologies, especially those treated with drugs associated 
with severe side effects, including, but not limited to, autoimmune diseases and chronic inflammatory 
diseases such as rheumatoid arthritis, and cancer chemotherapy. These side effects can be mainly 

1 5 explained by the fact that the in vivo selectivity of the drugs used is too low (for example, the inadequate 
selectivity between tumor and normal cells of most anticancer drugs is well known and their toxicity to 
normal tissues is dose limiting). In the first phase of one example of such a protocol, a conjugate of the 
protein of the invention or part thereof and an antibody to a tissue specific antigen (for example, tumor 
specific antigens in the case of cancer chemotherapy) is administered. After a delay to allow residual 

20 enzyme conjugate to be cleared from the blood, a relatively non-toxic compound is administered to the 
patient. This non-toxic compound is a substrate of the protein of the invention, and is converted by the 
protein into a substantially more toxic compound. Thus, because of the previous, targeted administration 
of the protein of the invention, when the non-toxic compound is administered, the toxic compound is only 
produced in the vicinity of the cells targeted by the fusion protein. This two-phase approach has been 

25 termed antibody-directed enzyme-prodrug therapy (ADEPT), this approach is reviewed by Melton et al. 
(Melton R. et al. - J. Natl. Cancer Inst. - 1996, 88, pl53-165). Alternatively the first phase can be 
replaced by a gene therapy approach resulting in the de novo synthesis of the protein of the invention or 
part thereof by cells from the targeted tissue, this has been termed gene-dependent enzyme/prodrug 
therapy (GDEPT). Another advantage of these 2 approaches (ADEPT and GDEPT) is that a single 

30 enzyme molecule is capable of activating many prodrug molecules. 

Protein ofSeq Id No: 242 (internal designation 119-003-4-0-C2-CS) 
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The protein of SEQ ID No: 242, encoded by the cDNA of SEQ ID No: 1, is homologous to 
proteins of the M20 metallopeptidases family (EC 3.4.17.X). The protein of the invention is over- 
expressed in the spinal cord and the brain. 

The M20 metallopeptidase family of proteins are all peptidases (i.e. enzymes able to hydrolyze 
5 peptide bonds) furthermore they are all exopeptidases, which means that they can hydrolyze the terminal 
amino acid of a protein or peptide. Members of the M20 peptidase family are glutamate 
carboxypeptidases, which are capable of releasing the C-terminal glutamate residue, by hydrolysis, from a 
wide range of N-acyl groups, including peptidyl, aminoacyl, benzoyl, benzyloxycarbonyl, folyl, and 
pteroyl groups, and physiologically are involved in the catabolism of proteins. M20 carboxypeptidases 

10 are either monomeric or homodimeric (i.e. 2 identical proteins assembled to from the enzyme). In order 
to be active, metallopeptidases must be associated with a metallic cofactor (either Zinc or Cobalt 
depending on the enzyme). The most studied carboxypeptidase of the M20 family is carboxypeptidase 
G2 (CPG2) (EC 3.4.17.11), a bacterial enzyme from Pseudomonas sp. (strain RS-16). CPG2 is a dimeric 
Zinc carboxypeptidase that cleaves the C-terminal glutamate moiety from a number of molecules. 

15 The protein of SEQ ID No: 242 includes the pfam signature for M20 peptidase (position 107 to 

451). The protein of SEQ ED No: 242 also includes a number of amino acids that are conserved 
throughout the M20 protease family especially those that interact with the metal cofactor. Preferred 
polypeptides of the invention are polypeptides of SEQ ID No: 242 that include the highly conserved 
amino acids: 133, 135, 149, 163, 200, 201 and/or 262, which are present in over 80% of the members of 

20 the M20 peptidase family, and/or amino acids 139, 157, 162, 16, 367 and/or 377, which are present in 
over 60% of the members of the M20 peptidase family. Of particular interest are amino acids 133, 166, 
201 and 262, which by homology are probably involved in the interaction with the metal cofactors. Thus 
it is believed that the protein of SEQ ID No: 242 or part thereof is a peptidase, preferably a 
carboxypeptidase, more preferably a metallocarboxypeptidase of the M20 family. Other preferred 

25 polypeptides of the invention are any fragments of SEQ ID No: 242 having any of the biological activities 
described herein. 

Determination of carboxypeptidase activity on specific substrates can easily be obtained by 
carrying out the hydrolysis using standard assay techniques such as the ones decribed by Slusher et al. 
(Slusher et al. - Prostate - 2000, 44(1): 55-60) or any other technique well known to those skilled in the 
30 art. Potential substrates are any substance containing a peptide bond, more especially C-terminal peptide 
bonds, and even more specifically, C-terminal glutamate. Such substances include but are not limited to 
peptides, folic acid and its analogues (e.g. methotrexate). 

In an embodiment the protein of the invention or part thereof could be used to develop assay tools 
to identify brain and spinal cord tissue since the protein of the invention is overexpressed in these tissues. 
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In still another embodiment, the protein of the invention or part thereof may be used to diagnose, 
treat and/or prevent disorders where the presence of substrates, for example excess proteins, is 
undesirable or deleterious. Such disorders include but are not limited to, cancer, neurodegenerative 
disorders such as Parkinson's and Alzheimer's diseases, and diabetes. In a most preferred embodiment, 
5 the protein of the invention or part thereof can be used in cancer chemotherapies in rescue therapy 
following toxic high dose methotrexate regimes. Enzymes of the M20 peptidase family can cleave the C- 
terminal glutamate moiety from folic acid and its analogues, such as methotrexate. The key role of 
reduced folates as coenzymes in many biological pathways including those leading to DNA synthesis via 
the pyrimidines and purines, has made folic acid a target molecule for chemotherapy. Tumor cells grow 

10 rapidly and have a high rate of nucleic acid synthesis. Depletion of folic acid has cytotoxic effects, 
primarily in replicating tissues, and can inhibit growth of tumors with high folic acid requirements. 
Enzymes of the M20 peptidase family can directly deplete folate by hydrolytic removal of its glutamate 
moiety. In cancer chemotherapy, methotrexate (4-amino-N 10 -methyl-pteroyl-glutamate) is commonly 
used to deplete the pool of reduced folates by inhibiting dihydrofolate reductase (DHFR), which catalyses 

15 the reduction of folates into biologically active tetrahydro folate form, essential in the biosynthesis of all 
folate coenzymes. Thus the protein of the invention or part thereof could be used in rescue therapy 
following toxic high-dose regimes such as described by Widemann et al. (Widemann B. et al. - Proc. Am. 
Assoc. Cancer Res. - 1995, 36, p232) and Chabner et al. (Chabner B. et al. - Nature - 1972, 239, p395- 
397), which disclosures are hereby incorporated by reference in their entity. The basis of this strategy is 

20 that hydrolysis of methotrexate produces 4-amino-N 10 -methyl-pteroate that is about 100 times less active 
as an inhibitor of DHFR. 

In another preferred embodiment, the protein of the invention or part thereof can be used in an 
enzyme/prodrug strategy to treat a number of pathologies, especially those treated with drugs associated 
with severe side effects, including, but not limited to, autoimmune diseases and chronic inflammatory 

25 diseases such as rheumatoid arthritis, and cancer chemotherapy. These side effects can be mainly 
explained by the fact that the in vivo selectivity of the drugs used is too low (for example, the inadequate 
selectivity between tumor and normal cells of most anticancer drugs is well known and their toxicity to 
normal tissues is dose limiting). In the first phase of one example of such a protocol, a conjugate of the 
protein of the invention or part thereof and an antibody to a tissue specific antigen (for example, tumor 

30 specific antigens in the case of cancer chemotherapy) is administered. After a delay to allow residual 
enzyme conjugate to be cleared from the blood, a relatively non-toxic compound is administered to the 
patient. This non-toxic compound is a substrate of the protein of the invention, and is converted by the 
protein into a substantially more toxic compound. Thus, because of the previous, targeted administration 
of the protein of the invention, when the non-toxic compound is administered, the toxic compound is only 
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produced in the vicinity of the cells targeted by the fusion protein. This two-phase approach has been 
termed antibody-directed enzyme -prodrug therapy (ADEPT), this approach is reviewed by Melton et al. 
(Melton R. et al - J. NatL Cancer Inst. - 1996, 88, pl53-165). Alternatively the first phase can be 
replaced by a gene therapy approach resulting in the de novo synthesis of the protein of the invention or 
5 part thereof by cells from the targeted tissue, this has been termed gene-dependent enzyme/prodrug 
therapy (GDEPT). Another advantage of these 2 approaches (ADEPT and GDEPT) is that a single 
enzyme molecule is capable of activating many prodrug molecules. 

Protein ofSEQIDNO: 401 (internal designation 160-88-3-0-A8-CS.corr) 

The protein of SEQ ID NO : 401 encoded by the cDNA SEQ ID NO: 160 is a splicing variant of 

10 the hypothetical human palmitoyl-protein thioesterase-2 (PPT2) (E.C. 3.1.2.22) (Genbank accession 
number AF020543), which is well conserved among eukaryotes (C. elegans and rodents) and exhibits 
homology with the palmitoyl protein thioesterase-1 (PPT1) (Genbank accession number L42809). The 
product of the cDNA SEQ ID NO: 160 is shorter than the human PPT2 (280 versus 308 amino acids 
respectively) with a gap located between the positions 174 and 203 of the protein PPT2. The protein of 

15 SEQ ID NO : 401 has a variant, the protein of SEQ ID NO: 402 encoded by the cDNA of SEQ ID NO: 
161, thought to have the same functions and utilities. 

PPT1 (E.C. 3.1.2.22) is a well-described protein, widely conserved among the murine, rat, bovine 
and human species (Swissprot accession number P50897). It is a lysosomal enzyme that functions in the 
removal of fatty acids from modified cysteine residues in proteins undergoing degradation (Hofmann S.L. 

20 et al, Neuropediatrics, 28: 27-30 (1997)). For example, PPT1 catalyses the deacylation H-ras and the 
alpha subunits of heterodimeric G proteins in vitro (Camp L.A., J. Biol Chem., 268: 22566-22574 (1993) 
and 269: 23212-23219 (1994)). Deacylation by PPT1 may be a prerequisite for complete digestion of the 
modified polypeptides. In fact there is evidence that palmitoylation leads to increased protection against 
proteolytic digestion. Both the salivary mucus glycoprotein (Slomiany B. L., Biochem. Biophys. Res. 

25 Commun., 151: 1046-1053 (1988),) and chemically acylated bee venom phospholipase A2 (Diaz, R.E., 
Biochem. Biophys. Acta, 830: 52-58 (1985)) are more resistant to treatment with proteinases than their 
deacylated forms. Mutations in PPT1 enzyme were shown to underlie the hereditary neurodegenerative 
disorder, infantile neuronal ceroid lipofuscinosis (Vesa et al, Nature, 376: 584-587 (1995)). 

Recently, Soyombo and Hofmann (J.Biol. Chem, 272: 27456-27463, (1997)) described a second 

30 lysosomal thioesterase, PPT2, that shares 20% identity with PPT1. The PPT2 enzyme presumably also 
plays a role in lysosomal thioester catabolism but has a substrate specificity distinct from that of PPT1. 
While little is known about the substrate specificity of PPT2, the enzyme is highly active against 
palmitoylated model substrates such as palmitoyl CoA. PPT2 did not hydrolyse the acyl-cysteine bond of 
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the protein substrates routinely used to assay PPT1 such as H-Ras and albumin. This finding suggest that 
although both enzymes possess intrinsic palmitoyl thioesterase activity, the "leaving group" recognized 
by the enzymes may differ. One possibility is that PPT2 recognizes palmitoylated protein substrates but 
that these substrates differ from those recognized by PPT1. A second possibility is that PPT2 recognizes a 
5 novel lipid thioester substrate that is not derived from acylated proteins. Aguado et al. {Biochem J. t 
341:679-689, (1999)) demonstrated that PPT2 is an acyl thioesterase . However they cannot distinguish 
between esterase (thioesterase) and lipase activity. PPT2 shows very high S-thioesterase activity towards 
the acyl chains Ci 4: o>Ci 6: o , moderate activity towards the acyl chains Ci 4: i>C 2 o:4 « Ci 6: i ~ Ci 8: o ~ 
Cn^Cig^ Ci8:3>C22:i ~ Ci 8: i~ C 2 0:0 , low activity towards the acyl chains Ci 0: o and C 2 2 : o , and no activity 

10 towards the acyl chain C 2 4 : o, C 8:0 , C 6: o , C^o and C 2: o. PPT2 has a broader range of action than PPT1, 
although both have a preference for long acyl chains (more than 12 or 14 carbons) over shorter acyl 
chains (less than 12 carbons). Aguado et al. (supra) also presented a detailed characterization of PPT2 
gene product. The putative 302-residue PPT2 and the protein of the invention contains a hydrophobic 
leader peptide at the N-terminus (signal peptide with a cleavage site predicted at position 34 of the protein 

15 of the invention) suggesting that they are secretory glycoproteins. Both proteins exhibit two motifs 
located at the N-terminus from positions 108 to 121. One motif is common to triglycerides lipases (from 
position 110 to 121) and the other one to eukaryotic thiol (Cys) proteases (from positions 108 to 121). 
Triglyceride lipases are lipolytic enzymes that hydrolyse the ester bond of triglycerides. The most 
conserved region in all these proteins is centered on a serine residue located in a conserved Gly-Xaa-Ser- 

20 Xaa-Gly motif. The PPT2 protein and the protein of the invention contain a cysteine residue (position 
115) instead of the first glycine residue in the motif but other lipases with one mismatch in either of the 
consensus have been described (Blow D., Nature, 343: 694-695 (1990)). In the same region as the lipase 
motif, PPT2 and the protein of the invention contains a motif common to the active site of eukaryotic 
thiol (Cys) protease but with a leucine residue (position 113) instead of the glycine at the position 5 of the 

25 pattern. In addition, the amino acid sequence of the putative PPT2 shows, at the C-terminus, from 
positions 171 to 186, a motif common to growth factor and cytokine receptors family, which is not 
present in the protein of the invention. 

Aguado et al. (supra) have found that PPT2 is expressed in cells of the immune system as an 
approximatively 42 kDa protein in cells extracts and supernatants and is transcribed as at least five 

30 different transcripts. The PPT2 gene is located in the class III region of the human MHC which contains 
several genes encoding proteins with potential roles in the immune system and in inflammation. In 
addition, Aguado et al. (supra) showed that very large amounts of PPT2 are secreted. However this is not 
in disagreement with an intracellular activity because the secreted protein could be internalized into the 
cell through a receptor and act on target located in an intracellular organelle. This mechanism has been 
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described for the secreted PPT1, which can be internalized into the cell by mannose-6-phosphate receptor 
to act in the lysosome (Verkruyse and Hofmann, J.Biol.Chem., 271: 15831-15836, (1996)), and Soyombo 
and Hofmann (J.Biol.Chem, 272: 27456-27463, (1997)) reported that PPT2 binds to mannose-6 
phosphate receptor. 

5 Palmitoylation refers to posttranslational modification of proteins in which the most common 

fatty acids of the cell (i.e. palmitic, stearic and oleic acids) are attached to the side chain of cysteine 
residues via high-energy thioester linkages (Bizzozero, O.A. et al, Neurochem.Res., 19: 923-933 (1994); 
Casey P.J., Science, 268: 221-225 (1995)). At present a large number of proteins of diverse origin, 
structure and function are known to be modified with these fatty acids that attach them to inner surface of 

10 the plasma membrane, where the can function optimally (Casey P.J., Science, 268: 221-225 (1995)). 
Being anchored to membranes is a process necessary for the diverse cellular functions of these modified 
proteins, including signal transduction, vesicle transport and maintenance of the cytoarchitecture. Almost 
every tissue and subcellular organelle contains characteristic set of palmitoylated proteins. 

The protein of the invention is overexpressed in brain. In recent years a considerable number of 

15 functionally relevant nervous system proteins including ions channels, neurotransmitter receptors, signal 
transduction components and cell-adhesion molecules have been found to be palmitoylated. Although the 
nervous system is not an exception to this rule, both the number of modified protein in this tissue and the 
dynamic nature of protein palmitoylation suggest that this modification is critical for regulating important 
biological processes and that the addition or removal of the fatty acid serves to regulate the activity of 

20 these proteins rather that to define their function. 

It is believed that the protein of SEQ ID NO: 401 or part thereof is an hydrolase, preferably acting 
on ester bonds, more preferably a thiolester hydrolase, even more preferably an acyl-thioesterase which, 
as such, plays a role in fatty acid metabolism, in cellular vesicle transport and maintenance of the 
cytoarchitecture, in cellular proteolysis, endocytosis, signal transduction, lysosomal storage, cell 

25 proliferation and differentiation, immune and inflammatory response. The enzyme's substrates are 
compounds preferably containing an ester bond, preferably a thiol ester bond, more preferably an acyl 
thioester bond. Preferred polypeptides of the invention are' polypeptides comprising the amino acids of 
SEQ ID NO: 401 from positions 108 to 121, and 110 to 121. Other preferred polypeptides of the 
invention are fragments of SEQ ID NO: 401 having any of the biological activities described herein. The 

30 hydrolytic activity of the protein of the invention or part thereof may be assayed using any of the assays 
known to those skilled in the art including those described in Smith et al, Biochem.J., 212: 155 (1983), 
Spencer et al, J.Biol.Chem., 253: 5922 (1978) and Aguado et al. (supra) or in US patents 5,445,942. 

In another preferred embodiment, the protein of the invention or part thereof may be used to 
diagnose, treat and/or prevent disorders where the presence of substrates is undesirable or deleterious. 
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Such disorders include but are not limited to infantile neuronal ceroid lipofuscinosis and lysosomal 
diseases. For diagnostic purposes, the expression of the protein of the invention could be investigated 
using any of the Northern blotting, RT-PCR or immunoblotting methods described herein and compared 
to the expression in control individuals. For prevention and/or treatment purposes, the expression of 
5 protein of the invention may be enhanced using any of the gene therapy methods described herein or 
known to those skilled in the art. 

In addition, the protein of the invention or part thereof may be used to identify inhibitors for 
mechanistic and clinical applications. Such inhibitors may then be used to identify or quantify the protein 
of the invention in a sample, and to diagnose, treat or prevent any of the disorders where the protein's 

10 hydrolytic activity is undesirable and/or deleterious including but not limited to lysosomal diseases, 
neurodegenerative disorder such as infantile neuronal ceroid lipofuscinosis, Parkinson's and Alzheimer's 
diseases, inflammatory and immune disorders including allergies and leukemia. 

Another object of the present invention are compositions and methods of targeting heterologous 
compounds, either polypeptides or polynucleotides to lysosomes by recombinantly or chemically fusing a 

15 fragment of the protein of the invention to an heterologous polypeptide or polynucleotide. Preferred 
fragments are any fragments of the protein of the invention, or part thereof, that may contain targeting 
signals for lysosomes such as those described in Vitale et al, Mol.Cell.Biol, 20: 7342-52 (2000), 
Blagoveshchenskaya et al., J.Biol.Chem., 273: 2729-37 (1998) and Kornfeld, FASEB J., 1: 462-8 (1987)). 
Such heterologous compounds may be used to modulate lysosomal activity. For example, they may be 

20 used to induce and/or prevent a lysosomal protein degradation. Moreover, antibodies binding to the 
protein of the invention or part thereof may be used for detection of the lysosomes using any techniques 
known to those skilled in the art. 

In still another embodiment, the invention relates to methods and compositions using the protein 
of the invention or part thereof as a marker protein to selectively identify tissues, preferably brain tissues. 

25 For example, the protein of the invention or part may be used to synthesize specific antibodies using any 
techniques known to those skilled in the art including those described therein. Such tissue-specific 
antibodies may then be used to identify tissues of unknown origin, for example, forensic samples, 
differentiated tumor tissue that has metastasized to foreign bodily sites, or to differentiate different tissue 
types in a tissue cross-section using immunochemistry. 

30 Another embodiment of the present invention relates to methods and compositions using the 

protein of the invention or part thereof to modify plant lipid composition using any assay known to those 
skilled in the art including those described by the US patents 5,955,650, 5,945,585 and 5,807,893. 
Indeed, plant lipids have a variety of nutritional uses and many recent research efforts have examined the 
role that saturated and unsaturated fatty acids play in reducing the risk of coronary heart disease. In the 
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past, it was believed that mono-unsaturates, in contrast to saturates and poly-unsaturates, had no effect on 
serum cholesterol and coronary heart disease risk. Several recent human clinical studies suggest that diets 
high in mono-unsaturated fat and low in saturated fat may reduce the "bad" (low-density lipoprotein) 
cholesterol while maintaining the "good" (high-density lipoprotein) cholesterol (Mattson et al, Journal of 
5 Lipid Research 26: 194-202 (1985)). 

In still another embodiment , the protein of the invention or part thereof may be used in enzyme 
replacement therapy, due to the ability of cells to take up exogeneously supplied protein and target it to 
lysosomes (Neufeld E.F., Annu.Rev.Biochem. 60: 257-280(1991), Brady R.O. et al, linker. Metab.Dis. 
17: 510-519 (1994)), or in bone -marrow transplantation (Hoogerbrugge P.M. et al, Lancet, 345: 1398- 

10 1402 (1995)), as bone-marrow-derived microglial cells are believed to penetrate the blood-brain barrier 
and may theoretically be able to provide sufficient enzyme to correct the metabolic defect in neurons 
(Krivit W., Cell transplant., 4: 385-392 (1995)). The protein of the invention or part thereof may be also 
used in genetic engineering of transplanted cells (Salvetti A. et al, Br Med J. 51: 106-122 (1995)) or 
neural progenitor cell engraftment (Snyder E.Y., Nature, 374: 367-370 (1995)) using any technique 

15 known to those skilled in the art. 

Protein ofSEQIDNO: 254 (internal designation 1 06-006- 1-0-E3-CS) 

Angiogenin is a member of the pancreatic Rnase superfamily of proteins. Its mechanism of 
action is postulated to involve multiple interactions with other proteins through specific regions on the 
molecular surface of angiogenin. Potential partners of angiogenin include heparin, plasminogen, elastase, 

20 angiostatin, actin, and a 170 kDa receptor on the surface of endothelial cells [Strydom, D. J. (1998) Cell. 
Mol. LifeSci. 54,811-824]. 

Angiogenin is required for the process of angiogenesis. Tumor growth requires angiogenesis, and 
several anti-angiogenic agents have been produced and are currently in the clinical trial stage. It has also 
been shown that recurrent gastric cancer patients had a much higher serum concentration of angiogenin 

25 than primary gastric cancer patients [Shimoyama, S. and Kaminishi, M. (2000) J. Cancer Res. Clin. 
Oncol. 126, 468-474]. Therefore, angiogenin can be used as a diagnostic marker for the evaluation of 
cancer aggressiveness or as an early marker for recurrence over a follow-up period. 

Angiogenin is a potent inducer of angiogenesis [Fett, J. W.; Strydom, D. J.; Lobb, R. R.; 
Alderman, E. M.; Bethune, J. L.; Riordan, J. F.; and Vallee, B. L. (1985) Biochemistry 24, 5480-5486]. 

30 Angiogenesis is a complex process of blood vessel formation comprising of several separate but 
interconnected steps at the cellular and biochemical level including: (i) activation of endothelial cells by 
the action of an angiogenic stimulus, (ii) adhesion and invasion of activated endothelial cells into the 
surrounding tissues and migration toward the source of the angiogenic stimulus, and (iii) proliferation and 
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differentiation of endothelial cells to form a new microvasculature [Folkman, J. and Shing, Y. (1992) J. 
Biol. Chem. 267, 10931-10934; Moscatelli, D. and Rifkin, D. B. (1988) Biochim. Biophys. Acta 948, 67- 
85]. 

Angiogenin has been demonstrated to induce most of the individual events in the process of 
5 angiogenesis including binding to endothelial cells [Badet, J.; Soncin, F.; Guitton, J.D.; Lamare, O.; 
Cartwright, T.; and Barritault, D. (1989) Proc. Natl. Acad. Sci U.S.A. 86, 8427-8431], stimulating second 
messengers [Bicknell, R. and Vallee, B. L. (1988) Proc. Natl. Acad. Sci. U.S.A. 85, 5961-5965], 
mediating cell adhesion [Soncin, F. (1992) Proc. Natl. Acad. Sci. U.S.A. 89, 2232-2236], activating cell- 
associated proteases [Hu, G. F. and Riordan, J. F. (1993) Biochem. Biophys. Res. Commun. 197, 682- 

10 687], inducing cell invasion [Hu, G-F.; Riordan, J. F.; and Vallee, B. L. (1994) Proc. Natl. Acad. Sci. 
U.S.A. 91, 12096-12100], inducing proliferation of endothelial cells [Hu, G-F.; Riordan, J. F.; and Vallee, 
B. L. (1997) Proc. Natl. Acad. Sci. U.S.A. 94, 2204-2209] and organizing the formation of tubular 
structures from the cultured endothelial cells [Jimi, S-L; Ito, K-L; Kohno, K.; Ono, M.; Kuwano, M.; 
Itagaki, Y.; and Isikawa, H. (1985) Biochem. Biophys. Res. Commun. 211, 476-483]. Angiogenin has 

15 also been shown to undergo nuclear translocation in endothelial cells via receptor-mediated endocytosis 
[Moroianu, J. and Riordan, J. F. (1994) Proc. Natl. Acad. Sci. U.S.A. 91, 1677-1681] and nuclear 
localization sequence-assisted nuclear import [Moroianu, J. and Riordan, J. F. (1994) Biochem. Biophys. 
Res. Commun. 203, 1765-1772]. 

While angiogenesis is a tightly-controlled process under usual physiological conditions, abnormal 

20 angiogenesis can have devastating consequences in pathological conditions such as arthritis, diabetic 
retinopathy and tumor growth. It is now well-established that the growth of virtually all solid tumors is 
angiogenesis dependent [Folkman, J. (1989) J. Natl. Cancer Inst. 82, 4-6]. Angiogenesis is also a 
prerequisite for the development of metastasis, since it provides the means whereby tumor cells 
disseminate from the original primary tumor and establish at distant sites [Mahadevan, V. and Hart, I. R. 

25 (1990) Rev. Oncol. 3, 97-103; Blood, C. H. and Zetter B. R. (1990) Biochim. Biophys. Acta 1032, 89- 
118]. Therefore, interference with the process of tumor-induced angiogenesis can be an effective therapy 
for both primary and metastatic cancers. 

Although originally isolated from medium conditioned by human colon cancer cells (Fett et al. 
(1985), supra), and subsequently shown to be produced by several other histological types of human 

30 tumors [Rybak, S. M.; Fett, J. W.; Yao, Q-Z.; and Vallee, B. L. (1987) Biochem. Biophys. Res, Commun. 
146, 1240-1248; Olson, K. A.; Fett, J. W.; French, T. C; Key, M. E.; and Vallee, B. L. (1995) Proc. Natl. 
Acad. Sci. U.S.A. 92, 442-446], angiogenin also is a constituent of human plasma and normally circulates 
at a concentration of 250-360 ng/ml [Shimoyama, S.; Gansauge, F.; Gansauge, S.; Negri, G.; Oohara, T.; 
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and Beger, H. G. (1996) Cancer Res. 56, 2703-2706; Blaser, J.; Triebl, S.; Kopp, C; and Tschesche, H. 
(1993) Eur. J. Clin. Chem. Clin. Biochem. 31,513-516]. 

Several inhibitors of the functions of angiogenin have been developed. These include: (i) 
monoclonal antibodies (mAbs) [Fett, J. W.; Olson, K. A.; and Rybak, S. M. (1994) Biochemistry 33, 
5 5421-5427], (ii) an angiogenin-binding protein [Hu, G-F.; Chang, S-L; Riordan, J. F.; and Vallee, B. L. 
(1991) Proc. Natl. Acad. Sci. U.S.A. 88, 2227-2231; Hu, G-F.; Strydom, D. J.; Fett, J. W.; Riordan, J. F.; 
and Vallee, B. L. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 1217-1221; Moroianu, J.; Fett, J. W.; Riordan, 
J. F.; and Vallee, B. L. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 3815-3819], (lii) the placental 
ribonuclease inhibitor (PRI) [Shapiro, R. and Vallee, B. L. (1987) Proc. Natl. Acad. Sci. U.S.A. 84, 2238- 

10 2241], (iv) peptides synthesized based on the C-terminal sequence of angiogenin [Rybak, 'S. M.; Auld, D. 
S.; St. Clair, D. K.; Yao, Q-Z.; and Fett, J. W. (1989) Biochem. Biophys. Res. Commun. 162, 535-543], 
and (v) inhibitory site-directed mutagenesis of angiogenin [Shapiro, R. and Vallee, B. L. (1989) 
Biochemistry 28, 7401-7408]. 

The subject invention provides the protein/polypeptide of SEQ ID NO: 254. The invention also 

15 provides biologically active fragments of SEQ ID NO: 254. In one embodiment, the polypeptides of SEQ 
ID NO: 254 are interchanged with the corresponding polypeptides encoded by the human cDNA of clone 
106-006-1-0-E3-CS. "Biologically active fragments" are defined as those peptide or polypeptide 
fragments having at least one of the biological functions of the full length protein (e.g., stimulation of 
angiogenesis). Compositions of the protein/polypeptide of SEQ ID NO: 254, or biologically active 

20 fragments thereof, are also provided by the subject invention. These compositions may be made 
according to methods well known in the art. 

The invention also provides variants of the protein of SEQ ID NO: 254. These variants have at 
least about 80%, more preferably at least about 90%, and most preferably at least about 95% amino acid 
sequence identity to the amino acid sequence encoded by SEQ ID NO: 254. Variants according to the 

25 subject invention also have at least one functional or structural characteristic of the protein of SEQ ID 
NO: 254. The invention also provides biologically active fragments of the variant proteins. 
Compositions of variants, or biologically active fragments thereof, are also provided by the subject 
invention. These compositions may be made according to methods well known in the art. Unless 
otherwise indicated, the methods disclosed herein can be practiced utilizing the protein encoded by SEQ 

30 ID NO: 254, biologically active fragments of SEQ ID NO: 254, variants of SEQ ID NO: 254, and 
biologically active fragments of the variants. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 
the amino acid sequence of SEQ ID NO: 254. In a preferred embodiment, SEQ ID NO: 254 is encoded 
by clone 1 06-006- 1-0-E3-CS. It is well within the skill of a person trained in the art to create these 
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alternative DNA sequences which encode proteins having the same, or essentially the same, amino acid 
sequence. These variant DNA sequences are, thus, within the scope of the subject invention. As used 
herein, reference to "essentially the same" sequence refers to sequences that have amino acid 
substitutions, deletions, additions, or insertions that do not materially affect biological activity. 
5 Fragments retaining one or more characteristic biological activity of the protein encoded by clone 106- 
006-1-0-E3-CS are also included in this definition. 

"Recombinant nucleotide variants" are alternate polynucleotides which encode a particular 
protein. They can be synthesized, for example, by making use of the "redundancy" in the genetic code. 
Various codon substitutions, such as the silent changes which produce specific restriction sites or codon 

10 usage-specific mutations, can be introduced to optimize cloning into a plasmid or viral vector or 
expression in a particular prokaryotic or eukaryotic host system, respectively. 

In one aspect of the subject invention, SEQ ID NO: 254, and variants thereof, can be used to 
generate polyclonal or monoclonal antibodies. Both biologically active and immunogenic fragments of 
SEQ ID NO: 254, or variant proteins, can be used to produce antibodies. Polyclonal and/or monoclonal 

15 antibodies can be made according to methods well known to the skilled artisan. Antibodies produced in 
accordance wkh the subject invention can be used in a variety of detection assays known to those skilled 
in the art. The antibodies may be used to agonize or antagonize the biological activity of the protein of 
SEQ ID NO: 254. 

SEQ ID NO: 254 can be used as a marker for individuals at risk for the development or 
20 recurrence of tumors. As indicated supra, angiogenin is found at certain levels in normal individuals, 
normally at concentrations of 250-360 ng/ml. Thus, quantitative immunoassays can be used for the 
detection of abnormal levels of SEQ ID NO: 254, thereby identifying those individuals at risk for the 
development of tumors. Alternatively, the subject invention provides antibodies specific for SEQ ID NO: 
254, or fragments thereof, which are used in routine immunoassays to screen for the presence or absence 
25 of SEQ ID NO: 254, or fragments thereof 

Alternatively, the nucleic acids which encode SEQ ID NO: 254, or fragments thereof, may be 
used in hybridization assays to detect and/or quantitate the expression of SEQ ID NO: 254. Such 
hybridization assays are well known to the skilled artisan and can be practiced on a variety of samples, 
including, but not limited to, tumor cells, biopsied tissues, or normal tissue. 
30 Molecules (see Strydom, D. J., (1998) Cell. Mol. Life Sci. 54, 811-824) that functionally inhibit 

the action of angiogenin can be used to treat patients with tumors. Because angiogenin is required for the 
vascularization of tumors, molecules which inhibit the biological activity of angiogenin can be used to 
reduce tumor vascularization and control tumor growth. Thus, another aspect of the invention provides 
molecules which inhibit, or reduce, the biological activity of SEQ ID NO: 254. One embodiment 
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provides neutralizing antibodies to inhibit the biological activity of SEQ ID NO: 254. These neutralizing 
antibodies may be chimeric or humanized, according to methods well known in the art, to minimize the 
immunogenicity of the molecules when used in patients. Neutralizing antibodies may be used in 
conjunction with other known therapeutic modalities for the treatment of tumors. 
5 Another embodiment of the invention utilizes the concept that expression of specific genes can be 

suppressed by oligonucleotides having a nucleotide sequence complementary to the mRNA transcript of 
the target gene. This suppression occurs by selectively impeding translation and has been termed an 
"antisense" methodology. In addition, "antigene" or "triplex" methodologies may also suppress 
expression of genes by using an oligonucleotide which is complementary to a selected site of double 

10 stranded DNA, thereby forming a triple-stranded complex to selectively inhibit transcription of the gene. 
Both "antisense" and "antigene" methodologies can be used to inhibit or reduce the expression of the 
gene of SEQ ID NO: 254, and thereby provide therapeutic benefit to the patient being treated. Methods 
of treating individuals using antigene and antisense methodologies are well known to those skilled in the 
art (see, for example, "Antisense Therapeutics" Agrawal, S. (ed), Humana Press, 1996; Crooke, S. T., and 

15 Bennett, C. F. (1996) Annu. Rev. Pharmacol. Toxicol. 36, 107-129; "Prospects for the Therapeutic Use of 
Antigene Oligonucleotides", Maher, L. J. (1996) Cancer Investigation 14(1), 66-82 each hereby 
incorporated by reference in its entirety). 

As additional examples, U.S. Pat. No. 5,098,890 is directed to antisense oligonucleotides 
complementary to the c-myb oncogene and antisense oligonucleotide therapies for certain ■ cancerous 

20 conditions. U.S. Pat. No. 5,135,917 provides antisense oligonucleotides that inhibit human interleukin-1 
receptor expression. U.S. Pat. No. 5,087,617 provides methods for treating cancer patients with antisense 
oligonucleotides. U.S. Pat. No. 5,166,195 provides oligonucleotide inhibitors of HIV. U.S. Pat. No. 
5,004,810 provides oligomers capable of hybridizing to herpes simplex virus Vmw65 mRNA and 
inhibiting replication. U.S. Pat. No. 5,194,428 provides antisense oligonucleotides having antiviral 

25 activity against influenza virus. U.S. Pat. No. 4,806,463 provides antisense oligonucleotides and methods 
using them to inhibit HTLV-III replication. U.S. Pat. No. 5,286,717 is directed to a mixed linkage 
oligonucleotide phosphorothioates complementary to an oncogene. U.S. Pat. No. 5,276,019 and U.S. Pat. 
No. 5,264,423 are directed to phosphorothioate oligonucleotide analogs used to prevent replication of 
foreign nucleic acids in cells. Each of these patents is hereby incorporated by reference in its entirety. 

30 The subject invention also provides modified/derivatized nucleic acids encoding SEQ ID NO: 

254. These include those modifications which increase the stability and/or affinity of these compounds 
for targets. Phosphorothioate analogs of oligodeoxynucleotides (ODNs), in which nonbridging 
phosphoryl oxygens in the backbone of DNA are substituted with sulfur ( [S]ODNs) are substantially 
more stable than their native phosphodiester counterparts. Other derivatives, such as those alkylated on 
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sugar oxygen groups, show enhanced target affinity. [S]ODNs possess good biological activity, 
pharmacology, pharmacokinetics and safety in vivo (Agrawal (1996), supra). Successful inhibition of 
specific gene function has been achieved by targeting various sites on specific mRNA sequences that 
include the AUG translational initiation codon, 5 '-transcriptional start site, 3 '-termination codon and 
5 sequences in both the 5' and 3 '-untranslated regions. These derivatized nucleic acids can be used in any 
of the aforementioned methodologies. 

Protein ofSEQID: 387 ( internal designation 1 05-07 3-2-0-A7-CS) 

The protein of SEQ ID NO ; 387 encoded by the cDNA of SEQ ID NO: 146 is expressed in liver, 
ovary, prostate and overexpressed in salivary glands. The protein of SEQ ID NO : 387 belongs to the 

10 abhydrolase family, and is caracterized by the alplWbeta hydrolase fold (Protein Eng 1992;5:197-211, 
which disclosure is hereby incorporated by reference in its entirety), that is common to a number of 
hydrolytic enzymes of widely differing phylogenetic origin and catalytic function. 

The core of each enzyme is an alpt^eta-sheet (rather than a barrel), containing 8 strand connected 
by helices. The enzymes are believed to have diverged from a common ancestor, preserving the 

15 arrangement of the catalytic residues. All have a catalytic triad, the elements of which are borne on loops, 
which are the best conserved structural features of the fold. 

Epoxide hydrolases are a family of enzymes which hydrolyze a variety of exogenous and 
endogenous epoxides to their corresponding diols. The epoxide hydrolase add water to epoxides, forming 
the corresponding diol. On the basis of sequence similarity, it has been proposed that the mammalian 

20 soluble epoxide hydrolase contain 2 evolutionarily distinct domains, the N-terminal domain is similar to 
bacterial haloacid dehalogenase, while the C-terminal domain is similar to soluble plant epoxyde 
hydrolase, microsomal epoxide hydrolase, and bacterial haloalcane dehalogenase (DNA Cell Biol. 
14 :61-71 (1995), which disclosure is hereby incorporated by reference in its entirety. Human epoxide 
hydrolase catalyse the addition of water to epoxides to form the corresponding dihydrodiol. The 

25 enzymatic hydratation is essentially irreversible and produces mainly metabolites of lower reactivity that 
can be conjugated and excreted. The reaction of epoxide hydrolase is therefore generally regarded as 
detoxifying. Commonly the function of epoxide hydrolase is finally followed by excretion of the diols. 
However, reactivation of certain diols by a second epoxidation may happen. Epoxide hydrolase 
inactivates also the epoxides existing in the metabolism of endogenous compounds. Lipophilic 

30 xenobiotics tend to accumulate into tissues, and they must be transformed to water soluble compounds to 
enable the excretion. In this transformation process reactive intermediates are produced. If 
biotransformation fails to detoxify these reactive intermediates, they may react covalently with critical 
targets like the genetic material, or start harmful reaction chains like lipid peroxidation. Therefore, 
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epoxide hydrolases are thought to be responsible for carcinogenicity and mutagenicity phenomenon (Exp 
Pathol 1990;39(3-4): 195-6.). In addition, the interaction between epoxide hydrolase activity and alcohol- 
metabolizing enzymes, suggests that epoxide hydrolase activity may be associed with the susceptibility to 
alcoholic liver disease and hepatocellular carcinoma (Toxicol. Lett. 10 ; 1 15 (l):17-22 (2000), which 
5 disclosure is hereby incorporated by reference in its entirety ). Compounds containing the epoxide 
functionality have become common environmental contaminants because of their wide use as pesticides, 
sterilants, and industrial precursors. Such compounds also occur as products, by-products, or 
intermediates in normal metabolism and as the result of spontaneous oxidation of membrane lipids (i.e. 
see, Brash, et al., Proc. Natl. Acad. Sci., 85:3382-3386 (1988), and Sevanian, A., et al., Molecular Basis 

10 of Environmental Toxicology (Bhatnager, R. S., ed.) pp. 213-228, Ann Algor Science, Michigan (1980)). 
As three-membered cyclic ethers, epoxides are often very reactive and have been found to be cytotoxic, 
mutagenic and carcinogenic (i.e. see Sugiyama, S., et al., Life Sci. 40:225-231 (1987)). Cleavage of the 
ether bond in the presence of electrophiles often results in adduct formation. As a result, epoxides have 
been implicated as the proximate toxin or mutagen for a large number of xenobiotics. Reactions of 

15 detoxification using epoxide hydrolases typically decrease the hydrophobicity of a compound, resulting in 
a more polar and thereby excretable substance. 

It is believed that the protein of SEQ ID NO: 387 or part thereof is an hydrolase, preferably an 
epoxyde hydrolase. Preferred polypeptides of the invention are polypeptides comprising the amino acids 
of SEQ ID NO: 387 from positions 2 to 132, 52 to 137, 29 to 120, 12 to 137, 19 to 136, 151 to 209, 141 to 

20 209, 30 to 108, and 35 to 108. Other preferred polypeptides of the invention are fragments of SEQ ED 
NO: 387 having any of the biological activity described herein. The hydrolytic activity of the protein of 
the invention or part thereof may be assayed using any of the assays known to those skilled in the art 
including those described in Cancer res 40(7):2552-6 (1980); Exp Pathol 39(3-4):195-6 (1990), which 
disclosures are hereby incorporated by reference in their entireties. 

25 The invention also relates to methods and compositions using the protein of the invention or part 

thereof to diagnose, prevent and/or treat several disorders linked to overexpression of the protein of the 
invention including alcoholic liver disease, hepatocellular carcinoma, ovarian and prostate cancers. 

In addition, the protein of the invention or part thereof may be used to identify inhibitors for 
mechanistic and clinical applications. Such inhibitors may then be used to identify or quantify the protein 

30 of the invention in a sample, and to diagnose, treat or prevent any of the disorders where the protein's 
hydrolytic activity is undesirable and/or deleterious such as disorders characterized by tissue degradation 
including but not limited to amyloidosis, colitis, lysosomal diseases, arthritis, muscular dystrophy, 
inflammation, tumor invasion, glomerulonephritis, parasite-borne infections, Alzheimer's disease, 
periodontal disease, and cancer metastasis. 
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In another embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify tissues, preferably ovarian, liver or 
prostate, more preferably salivary glands. For example, the protein of the invention or part may be used to 
synthesize specific antibodies using any techniques known to those skilled in the art. Such tissue specific 
5 antibodies may then be used to identify tissues of unknown origin, for example, forensic samples, 
differentiated tumor tissue that metastasized to foreign bodily, or to differentiate different tissue types in a 
tissue cross-section using immunochemistry. 

Protein of SEQ ID No: 398 (internal designation: 160-31-3-0-E4-CS) 

The protein of SEQ ID No: 398 encoded by the cDNA of SEQ ID No: 157, is overexpressed in 

10 fetal brain and shows homology with diverse hydrolases. The protein of the invention also displays a 
motif characteristic of isochorismatase proteins from positions 17 to 147. In addition, the protein of the 
invention is an alternatively spliced form of an unnamed human protein. 

It is believed that the protein of SEQ ID NO: 398 or part thereof is an hydrolase, preferably acting 
on ether bonds, more preferably an ether hydrolase. Preferred polypeptides of the invention are 

15 polypeptides comprising the amino acids of SEQ ID NO: 398 from positions 17 to 147. Other preferred 
polypeptides of the invention are fragments of SEQ ID NO: 398 having any of the biological activity 
described herein. The hydrolytic activity of the protein of the invention or part thereof may be assayed 
using any of the assays known to those skilled in the art including those described in US patents 
5,445,942; 5,445,956, 6,017,746 and 5,871,616 and in Rusnak et al, 1990; Biochemistry 29 1425-1435. 

20 In another embodiment, the invention relates to methods and compositions using the protein of 

the invention or part thereof as a marker protein to selectively identify tissues, preferably fetal brain. For 
example, the protein of the invention or part may be used to synthesize specific antibodies using any 
techniques known to those skilled in the art including those described therein. Such tissue-specific 
antibodies may then be used to identify tissues of unknown origin, for example, forensic samples, 

25 differentiated tumor tissue that has metastasized to foreign bodily sites, or to differentiate different tissue 
types in a tissue cross-section using immunochemistry. 

Proteins of SEQ ID NOs: 260 and 265 (internal designation 1 16-004- 3-0- A6-CS and 1 16-09 1-1 -0-D9-CS 
respectively) 

The protein of SEQ ID NO: 260 encoded by the cDNA SEQ ID NO: 19 and over expressed in 
30 liver and testis is an isoform of the protein of SEQ ID NO: 265 encoded by the cDNA SEQ ID NO: 24 
over expressed in liver. Both proteins show homology to murine EPCS26 (Hemberger M. et al., Dev. 
Biol. 222, 158-169 (2000)) with Genbank accession number AF250838. The proteins of SEQ ID NO: 260 
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and 265 contain a signal peptide (cleavage site at position 18) that could allow the export of the protein to 
the extracellular domain, the export to a cellular membrane or to define a particular subcellular 
localization. The cDNA encoding EPCS26 has been shown to be differentially expressed during the 
process of trophoblast invasion. 
5 Implantation and placentation are key processes in mammalian embryonic development. They 

physically connect the embryo to its mother and are critical for sufficient nutrient and gaz exchange. The 
extraembryonic cell lineage is the first to differentiate in the developing conceptus, reflecting the 
importance of this cell for the establishment of fetal-maternal connections. During murine development, 
the outer layer of blastocyt, the mural trophectoderm, begins to differentiate into primary trophoblast 

10 giant cells on day 5 of gestation (e5). These cells invade the uterine epithelium and penetrate deeply into 
the stroma. At the same time, the polar trophectoderm cells continue to proliferate and form the 
ectoplacental cone. On e7, the outer cells of the ectoplacental cone begin to differentiate into secondary 
trophoblast giant cells. The invasion of uterine stroma by these cells is critical for successful placentation 
(Cross et al., Science 266, 1508-1518 (1994)). 

15 Trophoblast invasion triggers secretion of proteinases that degrade extracellular matrix molecules. 

Mouse trophoblasts have been shown to synthesize and secrete serine proteases, matrix 
metalloproteinases and cysteine proteinases. Invasion of the trophoblast is a highly controlled process. 
The decidula restricts invasion by secreting proteinases inhibitors. Proteinases and proteinases inhibitors 
have antagonistic functions in implantation and placentation which may be mirrored by the reciprocity of 

20 their expression patterns (Alexander et al Development 122, 1723-1736 (1996)). 

During tumor invasion and metastasis, the degradation of the basement membranes is often 
accomplished by the proteinases implicated in implantation and normal trophoblast invasion (Strickland 
and Richards Cell 71, 355-357 (1992), Wilson et al. Proc. Natl. Acad. Sci. USA 94, 1402-1407 (1997)). 
Uncontrolled trophoblast invasion, as in choriocarcinomas, results in one of the most metastatic tumors 

25 known (Strickland and Richards Cell 71, 355-357 (1992)). 

A deficient fonction of the protein of the invention could result in an uncontrolled trophoblast 
invasion, and like in choriocarcinomas results in one of the most metastatic tumors known (Strickland and 
Richards Cell 71, 355-357 (1992)). 

It is believed that the proteins of SEQ ID NO: 260 and 265 or part thereof play a role in 

30 proteolysis, preferably during embryogenesis, more preferably during trophoblast invasion. The proteins 
of the invention or part thereof may act as secreted proteinases that degrade extracellular matrix 
molecules or at the contrary as proteinase inhibitors. Preferred polypeptides of the invention are 
polypeptides comprising the amino acids of SEQ ID NO: 260 from positions 7 to 122 and the amino acids 
of SEQ ID NO: 265 from positions 7 to 81. Other preferred polypeptides of the invention are fragments 
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of SEQ ID NO: 260 and 265 having any of the biological activities described herein. The proteolytic 
activity of the proteins of the invention or part thereof may be assayed using any of the assays known to 
those skilled in the art including those described in US patent 6,069,229 and 5,861,267. The protease 
inhibitor activity of the proteins of the invention or part thereof may be assayed using any of the assays 
5 known to those skilled in the art and using methods for determining inhibition constants well known to 
those skilled in the art (see Fersht, ENZYME STRUCTURE AND MECHANISM, 2nd ed., W.H. 
Freeman and Co., New York, (1985)) 

In addition, the proteins of the invention or part thereof may be used to diagnose, treat or prevent 
any of the disorders characterized by undesirable and/or deleterious hydrolytic activity such as disorders 

10 characterized by tissue degradation including but not limited to amyloidosis, colitis, lysosomal diseases, 
arthritis, muscular dystrophy, inflammation, tumor invasion, glomerulonephritis, parasite-borne 
infections, Alzheimer's disease, periodontal disease, cancer metastasis, and choriocarcinoma. For 
diagnostic purposes, the expression of the proteins of the invention could be investigated using any of the 
Northern blotting, RT-PCR or immunoblotting methods described herein and compared to the expression 

15 in control individuals. Alternatively, inhibitors for the proteins' activity may be developed and use to 
inhibit and/or reduce its activity using any methods known to those skilled in the art. Overexpression of 
the proteins of the invention or part thereof may be achieved using any of the gene therapy method 
described herein. 

In another embodiment, the invention relates to methods and compositions using the protein of 
20 the inventions or part thereof as a marker protein to selectively identify tissues, preferably liver and testis 
for the protein of SEQ ID NO: 260, preferably liver for the protein of SEQ ID NO: 265. For example, the 
proteins of the invention or part may be used to synthesize specific antibodies using any techniques 
known to those skilled in the art including those described therein. Such tissue-specific antibodies may 
then be used to identify tissues of unknown origin, for example, forensic samples, differentiated tumor 
25 tissue that has metastasized to foreign bodily sites, or to differentiate different tissue types in a tissue 
cross-section using immunochemistry. 

Protein of SEQ ID NO: 265 (internal designation 1 1 6-088-4-0-A9-CS) 

The protein of SEQ ID NO: 265 encoded by the cDNA of SEQ ID NO: 24 is overexpressed in 
testis and liver. This protein of the invention is homologous to the GdX protein, also named UBL4 
30 (Toniolo et al., Proc Natl Acad Sci USA 1988;85:851-5), found in both human (GENPEPT accession 
number L44140) and mice species (GENPEPT accession number J04761). In addition, the 174-amino- 
acid-long protein of SEQ ID NO: 265, which is similar in size to ubiquitin-like proteins, displays a pfam 
consensus domain from position 1 to 82 that is the hallmarks of ubiquitin family proteins. 
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Ubiquitin is a protein of 76 amino acid residues, found in all eukaryotic cells, and which is 
extremely well conserved from protozoan to vertebrates (Jentsch et al. Trends Cell Biol 2000;10:335-42). 
It plays a key role in a variety of cellular processes, such as ATP-dependent selective degradation of 
cellular proteins, maintenance of chromatin structure, regulation of gene expression, stress response, 
5 ribosome biogenesis, cell-cycle progression, signal transduction, transcription and antigen presentation 
(Wilkinson et al. Annu Rev Nutr 1995;15:161-89). The first ubiquitin is covalently ligated to target 
proteins through an isopeptide linkage between the C-terminal glycine residue of ubiquitin and an internal 
e-amino group of lysine residue of the substrate. To generated an efficient proteasomal targeting signal, 
additional ubiquitin are linked to the first one by isopeptide bounds, and form branched poly-ubiquitin 

10 complexes (Thrower et al. EMBO J 2000; 19: 94-102). Covalent binding of ubiquitin to proteins marks 
them for subsequent degradation by a multicomponent enzymatic complex known as the 26S proteasome 
(Hershko et al. Annu Rev Biochem 1992;61:761-807). 

The genes coding ubiquitin-like proteins fall into two separate classes (Hershko et al. Annu Rev 
Biochem 1992;61:761-807). Proteins of the first class are frequently designed as ubiquitin-like modifiers, 

15 or UBLs. They produce polyubiquitin molecules consisting of exact head to tail repeats of ubiquitin, with 
a variable number of repeats. These linear polymer of ubiquitin are linked covalently through peptide 
bonds between the C-terminal glycine residue and N-terminal lysine residue of contiguous ubiquitin 
molecules. Proteins of the second class are habitually named as ubiquitin-domain proteins, or UDPs. 
These proteins bear a single domain of the N-terminal domain that is related to ubiquitin, fused to a C- 

20 terminal ribosomal domain consisting of 52 or 76-80 amino-acid residues (Finley et al. Nature 
1989;338:394-401). These proteins are not conjugated to other proteins and function as an heterogeneous 
group of proteins. To date, this family includes RAD23, DSK2, PLIC-1, PLIC-2/Chapl, XDRP1, BAG-1, 
BAT3/Chap2, Scythe, Parkin, UIP28, UBP6, Elongin B, and GdX. In addition, the protein of invention of 
SEQ ID NO: 265 clearly belongs to the UDPs family, as it displays a single ubiquitin N-terminal 

25 consensus domain, which is the hallmark of this protein family subset. 

UDPs participate to regulation of proteolysis through multiple mechanisms such as interaction 
with catalytically active 26S proteasome for RAD23 (Schauber et al. Nature 1998;391:715-8), hPLIC-1 
and hPLIC-2 (Kleijnen et al. Mol Cell 2000;6:409-19), and BAG-1 (Luders et al. J Biol Chem 
" 2000;275:4613-7), removing ubiquitin from conjugates for UBP6 (Wyndham et al. Protein Sci 

30 1997;8:1268-75) and negative regulation of multi-ubiquitin chain assembly for RAD23 (Ortolan et al. 
Nature cell Biol 2000; 2:601-8). In addition, an increasing body of evidence indicates that some UDPs 
participate to other cellular functions as protein folding (Luders et al. J Biol Chem 2000;275:4613-7), 
apoptosis (Kaye et al. FEBS Lett 2000;467:348-55), and nucleotide-excision repair (de Laat et al. Genes 
Dev 1999;13:768-785). UDPs family proteins have been shown directly associated with pathogenesis of 
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several diseases including xeroderma pigmentosum for RAD23 (Masutani et al. EMBO J 1994;13:1831- 
43), and Parkinson's disease for parkin (Kitada et al. Nature 1998;392:605-8). In addition, involvement of 
ubiquitin-like proteins or abnormal ubiquitinated accumulation of proteins has been found in multiple 
human disorders. Most of them, but not all, involve nervous central system as Alzheimer's disease (van 
5 Leeuwen et al. Science 1998;279:242-7), diffuse Lewy body disease (Iseki et al. J Neurol Sci 
1997;146:53-7), Huntington disease (Scherzinger et al. Cell 1997;90:549-58), and amyotrophic lateral 
sclerosis (Leigh et al. Brain 1991;114:775-88). In most disorders, ubiquinated-proteins accumulate within 
cells and form aggregates termed inclusion bodies that have characteristic appearance on histological 
examination. In addition, abnormal accumulation of ubiquitinated proteins has been found in Von-Hippel 

10 Lindau disease (Kamura et al Proc Natl Acad Sci USA. 2000;97:10430-5), and in liver of alcoholic 
hepatitis patients (Ohta et al. Lab Invest. 1988;59:848-56). Components of hepatocytes are released 
within the circulation in alcoholic hepatitis (Sorbi et al. Am J Gastroenterol 1999;94:1018-22) * 

It is believed that the protein of SEQ ID NO: 265 or part thereof plays a role in the regulation of 
proteolysis, preferably as a ubiquitin-like protein, more preferably as a ubiquitin-domain protein. In 

15 addition, the protein of the invention may play a role in protein folding, apoptosis and nucleotide-excision 
repair. Preferred polypeptides of the invention are polypeptides comprising the amino acids of SEQ ED 
NO: 265 from positions 1 to 82. Other preferred polypeptides of the invention are fragments of SEQ ID 
NO: 265 having any of the biological activity described herein. 

In an embodiment, the invention relates to compositions and methods using the protein of the 

20 invention or part thereof to remove, identify or inhibit contaminating proteases in a sample. 
Compositions comprising the polypeptides of the present invention may be added to biological samples as 
a "cocktail" with other protease inhibitors to prevent degradation of protein samples. The advantage of 
using a cocktail of protease inhibitors is that one is able to inhibit a wide range of proteases without 
knowing the specificity of any of the proteases. Using a cocktail of protease inhibitors also protects a 

25 protein sample from a wide range of future unknown proteases which may contaminate a protein sample 
from a vast number of sources. Such protease inhibitor cocktails (see for example the ready to use 
cocktails sold by Sigma) are widely used in research laboratory assays to inhibit proteases susceptible of 
degrading a protein of interest for which the assay is to be performed. For example, the protein of the 
invention or part thereof is added to samples where proteolytic degradation by contaminating proteases is 

30 undesirable. Alternatively, the protein of the invention or part thereof may be bound to a chromatographic 
support, either alone or in combination with other protease inhibitors, using techniques well known in the 
art, to form an affinity chromatography column. A sample containing the undesirable protease is run 
through the column to remove the protease. Alternatively, the same methods may be used to identify new 
proteases. 
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Another embodiment of the invention relates to compositions and methods of using the protein of 
invention or part thereof to develop assays for the immunohistochemical detection of testicular malignant 
tissue, as the protein is overexpressed in such tissue. For instance, this could be used for staging lymph 
node testicular cancer dissemination using the techniques and methods detailed in Nazeer et al. Oncol Rep 
5 (1998);5: 1425-9. The ability to specifically visualize malignant tissues (and cells derived from the 
tissues), is useful for numerous applications, including to determine the origin, to identity e.g. cancerous 
cells, as well as to facilitate the identification of particular cells and tissues for, e.g. the evaluation of 
histological slides. 

In another embodiment, the invention relates to compositions or methods using the protein of 

10 SEQ ID NO: 265 or part thereof to diagnose, treat and/or prevent disorders including, but not limited to 
xeroderma pigmentosum, Von-Hippel Lindau disease, alcoholic hepatitis, in neurodegenrative diseases 
such as Alzheimer's disease, diffuse Lewy body disease, Huntington disease, and amyotrophic lateral 
sclerosis. Detection of poly-ubiquinated protein conjugates in biological samples, such as brain tissues 
for the diagnosis of neurodegenerative disorders or liver and serum or plasma for the diagnosis of 

15 alcoholic hepatitis, may be performed using antibodies or nucleic acid able to detect the expression of the 
protein of the invention using immunohistochemisty, enzyme -linked immunosorbant assay (ELISA) or 
any other technique known to those skilled in the art including Northern blotting, RT-PCR or 
immunoblotting methods described herein as well as the technique described in Mimnaugh et al. 
Electrophoresis 1999;20:418-28. The expression of the protein of the invention in patients' samples is 

20 then compared to the expression in control individuals. 

In still another embodiment, the invention relates to compositions or methods to treat, attenuate 
and/or prevent disorders including, but not limited to xeroderma pigmentosum, Von-Hippel Lindau 
disease, alcoholic hepatitis, in neurodegenerative diseases such as Alzheimer's disease, diffuse Lewy body 
disease, Huntington disease, and amyotrophic lateral sclerosis using the protein of the invention, part 

25 thereof, or any other compounds developed using the present protein as nucleic acids, antibodies, or 
chemical substances. In a preferred embodiment, proteins or other compounds targeted against the 
protein of invention or part thereof may be used to treat, prevent and/or attenuate disorders in which 
ubiquitin-like proteins or abnormal accumulation of ubquitininated proteins has been found and can be 
involved in pathogenesis of the disease. For instance, proteins or other compounds targeted against 

30 protein of SEQ ID NO: 265 can be administered to treat or attenuate symptoms of patients affected with 
Alzheimer's disorder or any other neurodegenerative disorders. 

Protein of SEQ ID NO: 408 (internal designation 1 74-8-2-0-C10-CS) 
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The protein of SEQ ID NO: 408 encoded by the cDNA of SEQ ID NO: 167 found in salivary 
gland and brain is homologous to a drosophila melanogaster protein thought to be transmembraneous 
(STR: Q9V641). The 345-amino-acid-long protein of SEQ ID NO: 408 displays the Rhomboid pfam 
domain from positions 186 to 323 and is predicted having six transmembrane domains from positions 101 
5 to 121, 167 to 187, 204 to 224, 243 to 263, 273 to 293, 298 to 318. 

Rhomboid genes were identified in flies and in organisms as diverse as Arabidopsis, yeast, 
bacteria, and mammals. Human and rat homologues of Rhomboid have been identified (Pascal et al.: 
1998; FEBBS Lett. 429; 337-340). This very widespread conservation implies that the Rhomboid family 
proteins have a fundamental function within many cells. The Drosophila Rhomboid has six 

10 transmembrane domains and an amino terminal hydrophobic region like the protein of the invention. 

The 355-amino-acid-long Drosophila Rhomboid protein is known to control many aspects of fly 
development and especially, to establish position along the dorsoventral axis and then again later to 
specify the fate of neuronal precursor cells. Rhomboid expression is sufficient to activate EGF receptor 
(EGFr) signaling in all tissues in Drosophila, while loss of Rhomboid mimics reduction (or loss) of EGFr 

15 signaling in almost all tissues (Guichard et al :.1999 Development; 126, 2663-2676). As in mammals, the 
drosophila EGF receptor controls many aspects of growth and development. Three activating ligands of 
the drosophila EGFr have been described, the most developmentally significant being the TGF alpha-like 
molecule, Spitz (Rutledge et al. : 1992; Genes & Dev. 6; 1503-1517). None of the Rhomboid-like 
proteins from species other than Drosophila have clearly assigned functions. However, there is 

20 compelling genetic evidence from Drosophila that Rhomboid has a key role in intercellular signaling: it 
functions as an activator of the EGF receptor, probably by controlling the activation the TGF-like ligand 
Spitz (Guichard et al :.1999 Development; 126, 2663-2676). Indeed, Rhomboid expression is the 
principal rate-limiting step in activation of the Ras/MAP kinase pathway by the EGFr. 

Like mammalian TGF alpha, Spitz is synthesized as a functionally inert transmembrane protein; 

25 subsequently, the proteolytic release of the extracellular portion of the molecule gives rise to a soluble 
and potent EGFr ligand (Golembo et al.: 1996; Development; 122; 3363-3370). Unlike all other essential 
components of EGFr signaling, the expression of Rhomboid is tightly restricted to sites of signaling 
activity. It has been proposed that Rhomboid attains its key role in the pathway by regulating the 
proteolytic cleavage of Spitz (Wasserman et al : 2000; Genes & development; 14; 1651-1663). The 

30 preeminence of Rhomboid in a pathway as critical to development and growth control as the 
EGFr/Ras/Map kinase cascade provides a strong incentive to understand its molecular mechanism. By 
analogy to mammalian EGFr ligands that are similarly processed, Spitz cleavage is expected to be 
catalyzed by an ADAM like protease (Black et al 1998; Curr. Opin. Cell.Biol. 10; 654-659), but 
Rhomboid resembles no known protease. 
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The Drosophila eye has served as a useful model for studying mechanisms of EGFr and Ras 
signaling. At least five different roles for the receptor have been identified (for reviews see Wasserman et 
al : 2000; Genes & development; 14; 1651-1663), the best characterized being its function in recruiting 
cells into the developing ommatidium- the individual unit of the fly compound eye. Each ommatidium 
5 contains eight photoreceptors, four cone cells that secrete lens material, and an average of eight pigment 
cells. It has been shown that the fly EGFr has a role in regulating cell survival in the developing eye 
(Dominguez et al. 1998; Curr. Biol. 8; 1039-1048). 

The EGFr signaling pathway has been conserved between flies and vertebrates. The EGFr family 
consists of four members, HER1 (c-erbBl, EGFR), HER2 (c-erbB2), HER3 (c-erbB3), HER4 (c-erbB4), 

10 expressed in a wide range of cells (Gullick W.J. 1998; Br. Cancer Res. Treat.; 52, 43-53). TGF.alpha. and 
its homologs have been found to be the most abundant ligands for the EGF/TGF. alpha, receptor in most 
parts of the brain (Kaser, et al., (1992)Brain Res Mol Brain Res: 16:316-322). There appears to be a 
widespread distribution of TGF.alpha. in various regions of the brain in contrast to EGF which is only 
present in smaller, more discrete areas, suggesting that TGF-alpha might play a physiological role in brain 

15 tissues. These numerous receptor sites for TGF.alpha in the brain suggest that TGF has an important 
utility in promoting normal brain cell differentiation and function. 

Transforming growth factor alpha (TGF-alpha.) is a relative of epidermal growth factor (EGF) 
and like EGF, it exerts its effects on cells through binding to the EGF receptor. The precise physiological 
roll of TGF.alpha. is still not clear, although it appears to be important in eye and hair follicle 

20 development and may play a role in both the immune system and in wound healing. (See Kumar, et al.; 
1995 Cell Biology International, 19:5, 373-388). The EGF family receptors currently includes four EGF 
receptors. The EGFR2 receptor may also be referred to as ERB-2 and this molecule is useful for a variety 
of diagnostic and therapeutic indications (Prigent, S. A., and Lemoine, N. R., (1992) Prog Growth Factor 
Res., 4:1-24). The TGF-alpha is likely a ligand for one or more of these receptors as well as for yet an 

25 identified new EGF-type receptor. Use of the TGF-alpha. can assist with the identification, 
characterization and cloning of such receptors. For example, the EGF receptor gene represents the cellular 
homolog of the v-erb-B oncogene of avian erythroblastosis virus. Over expression of the EGF-receptor or 
deletion of kinase regulatory segments of the protein can bring about tumorigenic transformation of cells 
(Manjusri, D. et al., (1991) Human Cytokines, 364 and 381). . 

30 The EGF receptor, and the related ErbB family of receptor tyrosine kinases, have indeed been 

much implicated in human cancer. It is commonly believed that hyperactive receptor signaling promotes 
dysregulates growth control and in involved in the onset of malignancy, as well as in the disruption of 
developmental programs. Very little, however, is known about ErbB physiological regulation in humans. 
The fruitfly, Drosophila melanogaster, has a single receptor homologous to the four ErbB receptors. As 
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signaling mechanisms have been well conserved between flies and mammals, these results of experiments 
in flies are relevant to the study of the human receptors in development and disease. Two areas of recent 
progress are emphasized. First, a number of signal modulators have been identified, including three EGF 
receptor inhibitors, several of which have human homologues. Second, the signaling molecules are 
5 integrated into regulatory networks that specify the elaborate activation profiles needed in development 
(positive and negative feedback control of EGF receptor signaling emerges as a central theme). 

It is thus important to discover whether Rhomboid-like proteins also have functions similar to 
those observed in Drosophila in other higher organisms, including mammals, because of the substantial 
clinical importance of the EGFr pathway. 

10 It is believed that the protein of SEQ ID NO: 408 or part thereof plays a role into the control of 

cellular signaling. Preferably the protein of the invention or part thereof plays a role in the activation of 
EGFr-mediated cell signaling, probably through the control of the activation of EGFr ligands, such as 
EGF, TGF alpha and TGF alpha-like factor, more probably through the proteolytic cleavage of such 
ligands. Preferred polypeptides of the invention are polypeptides comprising the amino acids of SEQ ID 

15 NO: 408 from positions 186 to 323, 101 to 121, 167 to 187, 204 to 224, 243 to 263, 273 to 293, and 298 
to 318. Other preferred polypeptides of the invention are fragments of SEQ ID NO: 408 having any of 
the biological activity described herein. The proteolytic activity of the protein of the invention or part 
thereof as well as its involvment in regulation of cellular signalling though the activation of EGFr may be 
assayed using any of the assays known to those skilled in the art. 

20 An embodiment of the invention relates to composition and methods using the protein of the 

invention or part thereof to identify and/or quantify the activation of EGF receptors, preferably 
veretebrate EGF receptors, more preferably human ErbB receptors, in a biological sample, and thus used 
in assays and diagnostic kits for the quantification of such activation in bodily fluids, in tissue samples, 
and in mammalian cell cultures. The assessment of the activation of EGF receptors may be perfomed 

25 using any assay familiar to those skilled in the art. Preferably, a defined quantity of the protein of the 
invention or part thereof is added to the sample under conditions allowing the activation of EGFr. Then, 
the activation of EGFr is assayed and eventually compared to a control using any of the techniques known 
by those skilled in the art. 

The present invention also relates to diagnostic assays for detecting altered levels of the protein of 

30 the present invention in various tissues since an over-expression of the proteins compared to normal 
control tissue samples can detect the presence of certain disease conditions such as neoplasia, skin 
disorders, ocular disorders and inflammation. Assays used to detect levels of the polypeptide of the 
present invention in a sample derived from a host are well-known to those of skill in the art and include 
radioimmunoassays competitive-binding assays, Western Blot analysis and preferably ELISA assays. 
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This invention is also related to the use of SEQ ID No: 167 or its complement as a diagnostic 
tool. Detection of a mutated form of the nucleotide sequence of SEQ ID No: 167 of the present invention 
will allow a diagnosis of a disease or a susceptibility to a disease which results from underexpression of 
the polypeptide of the present invention for example, improper wound healing, improper neurological 
5 functioning, ocular disorders, kidney and liver disorders, hair follicular development, angiogenesis and 
embryogenesis Individuals carrying mutations in the human nucleotide sequence of SEQ ID No: 167 of 
the present invention may be detected at the DNA level by a variety of techniques. Nucleic acids for 
diagnosis may be obtained from a patient's cells, such as from blood, urine, saliva, tissue biopsy and 
autopsy material. The genomic DNA may be used directly for detection or may be amplified 

10 enzymatically by using PCR (Saiki et al., (1986) Nature, 324:163-166) prior to analysis. RNA or cDNA 
may also be used for the same purpose. As an example, PCR primers complementary to the nucleic acid 
encoding a polypeptide of the present invention can be used to identify and analyze mutations thereof. For 
example, deletions and insertions can be detected by a change in size of the amplified product in 
comparison to the normal genotype. Point mutations can be identified by hybridizing amplified DNA to 

15 radiolabeled RNA or alternatively, radiolabeled antisense DNA sequences. Perfectly matched sequences 
can be distinguished from mismatched duplexes by RNase A digestion or by differences in melting 
temperatures. 

In another embodiment, the protein of the invention or part thereof can be used to diagnose, treat 
and/or prevent disorders linked to dysregulation of growth control, such as cancer and other disorders 

20 relating to abnormal cellular differentiation, proliferation, or degeneration, including hyperaldosteronism, 
hypocortisolism (Addison's disease), hyperthyroidism (Grave's disease), hypothyroidism, colorectal 
polyps, gastritis, gastric and duodenal ulcers, ulcerative colitis, and Crohn's disease, neurodegenrative 
disroders such as Parkinson's and Alzheimer's diseases using any methods and/or techniques described 
herein. For diagnostic purposes, the expression of the protein of the invention could be investigated using 

25 any of the Northern blotting, RT-PCR or immunoblotting methods described herein and compared to the 
expression in control individuals. In addition, the protein of the invention or part thereof may be used to 
evaluate the disease progression and the clinical treatment efficiency. Inhibition of expression of the 
protein of the invetion or part thereof to inhibit EGFr activation could be achieved by many means known 
to those skilled in the art including those described in the present application such as antisense nucleotide 

30 or triple helix strategies. 

Protein ofSEQIDNO:291 (internal designation: 180-19-4-0-F4-CS) 

The protein of SEQ ID No:291 encoded by the cDNA of SEQ ID No:50 is homologous to 
proteins of the tissue inhibitor of metalloproteinases (TIMP) family. The protein of the invention (207 
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amino-acids) is highly homologous to and appears to be a variant of the metalloproteinase inhibitor 1 
precursor (TIMP-1, 207 amino-acids) human protein (SwissProt P01033). The protein of the invention is 
stronly expressed in the liver, ovary and testis. 

There are many different types of collagen found in the body and they, together with other 
5 extracellular matrix components, such as elastin, gelatin, proteoglycan and fibronectin, make up a large 
proportion of the body's extracellular tissue. Matrix metalloproteinases (MMPs) are enzymes that are 
involved in the degradation and denaturation of extracellular matrix components. Collagenases, for 
example, are MMPs that degrade or denature collagen. A large number of different collagenases are 
known to exist. These include interstitial collagenases, type IV-specific collagenases and collagenolytic 

10 proteinases. Collagenases are generally specific for collagens which, in their full triple helix structure, are 
extremely resistant to other enzymes. Other MMPs are involved in the degradation and denaturation of 
different extracellular matrix components, for example, elastin, gelatin and proteoglycan. Some MMPs 
are able to degrade or denature several different types of collagen and also other extracellular matrix 
components. For example, stromelysin degrades type IV collagen, which is found in basement 

15 membrane, and also has an effect on other extracellular matrix components such as elastin, fibronectin 
and cartilage proteoglycans. The ability of MMPs metalloproteinases (such as collagenase, stromelysin, 
and gelatinase) to degrade various components of connective tissue makes them potential targets for 
controlling numerous pathological processes. 

The presence of tissue inhibitors of MMPs has been observed in a variety of explants and in 

20 monolayer cultures of mammalian connective tissue cells (Vater et al 1979 and Stricklin and Wegus 
1983). Not only collagenase inhibitors but also inhibitors for other MMPs, for example, gelatinase and 
proteoglycanase have been found. MMP inhibitors are generally unable to bind the inactive (zymogen) 
forms of the respective enzymes but complex readily with active forms (Murphy et al 1981). Tissue 
MMP inhibitors are found, for example, in dermal fibroblasts, human lung, gingival, tendon and corneal 

25 fibroblasts, human osteoblasts, uterine smooth muscle cells, alveolar macrophages, amniotic fluid, 
plasma, serum and the .alpha.-granule of human platelets (Stricklin and Wegus 1983; Welgus et al 1985; 
Welgus and Stricklin 1983; Bar-Sharvit et al 1985; Wooley et al; 1976; and Cooper et al 1985). 

The protein of the invention is a secreted TIMP-1 protein which tightly complexes with 
metalloproteinases and irreversibly inactivate them. TLMP-1 has been identified as a secretory product of 

30 platelets and alveolar macrophages 

Thus, an embodiment of the present invention relates to the use of the protein of the invention or 
a fragment thereof to inhibit the action of MMPs by directly inhibiting the enzyme activity like a 
conventional inhibitor. The inhibitory activity of a MMP inhibitor may be assessed by any method 
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suitable for determining inhibitory activity of a compound with respect to an enzyme. Such methods are 
described in standard textbooks of biochemistry. 

In one embodiment, the protein of SEQ ID NO:291 can be used to treat and diagnose disorders 
associated with excessive MMP expression, such as inflammatory disorders such as rheumatoid arthritis, 
5 osteoarthritis, osteopenias such as osteoporosis, pulmonary emphysema, periodontitis, gingivitis, corneal 
epidermal or gastric ulceration, and tumour metastasis, invasion and growth, Paget's disease, 
hyperparathyroidism. MMP inhibitors are also of potential value in the treatment of neuroinflammatory 
disorders, including those involving myelin degradation, for example multiple sclerosis, as well as in the 
management of angiogenesis dependent diseases, which include arthritic conditions and solid tumour 

10 growth as well as psoriasis, proliferative retinopathies, neovascular glaucoma, ocular tumours, 
angiofibromas and hemangiomas. The present invention relates to a method of treating diseases in which 
MMPs are involved such as atherosclerotic plaque rupture, restenosis, aortic aneurysm (including 
abdominal aortic aneurysm and brain aortic aneurysm), congestive heart failure, left ventricular dilatation, 
myocardial infarction, decubital ulcers, chronic ulcers or wounds, renal disease, or other autoimmune or 

15 inflammatory diseases dependent upon tissue invasion by leukocytes, Crohn's disease, acute respiratory 
distress syndrome, asthma, chronic obstructive pulmonary disease, Alzheimer's disease, organ transplant 
toxicity, cachexia, allergic reactions, allergic contact hypersensitivity, epidermolysis bullosa, loosening of 
artificial joint implants, stroke, cerebral ischemia, head trauma, spinal cord injury, neuro-degenerative 
disorders (acute and chronic), Huntington's disease, Parkinson's disease, migraine, depression, peripheral 

20 neuropathy, pain, cerebral amyloid angiopathy, nootropic or cognition enhancement, amyotrophic lateral 
sclerosis, ocular angiogenesis, macular degeneration, abnormal wound healing, burns, diabetes, scleritis, 
AIDS, sepsis, septic shock. 

In another embodiment, the protein of SEQ ID NO: 291 has potential value in the treatment or 
diagnosis of atherosclerosis. The rupture of atherosclerotic plaques is the most common event initiating 

25 coronary thrombosis. Destabilization and degradation of the extracellular matrix surrounding these 
plaques by MMPs has been proposed as a cause of plaque Assuring. The shoulders and regions of foam 
cell accumulation in human atherosclerotic plaques show locally increased expression of gelatinase B, 
stromelysin-1, and interstitial collagenase. In situ zymography of this tissue revealed increased 
gelatinolytic and caseinolytic activity (Galla, et al., J. Clin. Invest., 1994;94:2494-2503). In addition, 

30 high levels of stromelysin RNA message have been found to be localized to individual cells in 
atherosclerotic plaques removed from heart transplant patients at the time of surgery (Henney, et al., Proc. 
Natl Acad. Sci., 1991;88:8154-8158). 

In another embodiment, the protein of the invention has utility in treating or detecting 
degenerative aortic disease associated with thinning of the medial aortic wall Increased levels of the 
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proteolytic activities of MMPs have been identified in patients with aortic aneurysms and aortic stenosis 
(Vine N. and Powell J. T., Clin. Sci., 1991;81:233-239). 

In another embodiment, the protein of the invention can be used as a treatment or diagnostic tool 
for heart failure and associated ventricular dilatation. Heart failure arises from a variety of diverse 
5 etiologies, but a common characteristic is cardiac dilation which has been identified as an independent 
risk factor for mortality (Lee, et al., Am. J. Cardiol., 1993;72:672-676). This remodeling of the failing 
heart appears to involve the breakdown of extracellular matrix. MMPs are increased in patients with both 
idiopathic and ischemic heart failure (Reddy, et al., Clin. Res., 1993;41:660A; Tyagi S. C, et al., Clin. 
Res., 1993;41:681A). Animal models of heart failure have shown that the induction of gelatinase is 

10 important in cardiac dilation (Armstrong, et al., Can. J. Cardiol., 1994;10:214-220), and cardiac dilation 
precedes profound deficits in cardiac function (Sabbah, et al., Am. J. Physiol., 1992;263:H266-H270). 

In another embodiment, the protein of the invention is useful in treating or detecting neointimal 
proliferation, leading to restenosis, frequently developed after coronary angioplasty. The migration of 
vascular smooth muscle cells (VSMCs) from the tunica media to the neointima is a key event in the 

15 development and progression of many vascular diseases and a highly predictable consequence of 
mechanical injury to the blood vessel (Bendeck M. P., et al., Circulation Research, 1994;75:539-545). 
Northern blotting and zymographic analyses indicated that gelatinase A was the principal MMP expressed 
and excreted by these cells. Further, antisera capable of selectively neutralizing gelatinase A activity also 
inhibited VSMC migration across basement membrane barrier. (Pauly R. R., et al., Circulation Research, 

20 1994;75:41-54). 

In another embodiment, the protein of the invention is used to ensure normal kidney function, 
which is dependent on the maintenance of tissues constructed from differentiated and highly specialized 
renal cells. Those cells are in a dynamic balance with their surrounding extracellular matrix (ECM) 
components (Davies M. et al., Kidney Int., 1992;41:671-678). Effective glomerular filtration requires 

25 that a semi-permeable glomerular basement membrane (GBM) composed of collagens, fibronectin, 
enactin, laminin and proteoglycans is maintained. A structural equilibrium is achieved by balancing the 
continued deposition of ECM proteins with their degradation by specific MMPs. These proteins are first 
secreted as proenzymes and are subsequently activated in the extracellular space. These proteinases are in 
turn subject to counter balancing regulation of their activity by naturally occurring inhibitors as TIMPs. 

30 Deficiency or defects in any component of the filtration barrier may have catastrophic 

consequences for longer term renal function. For example, in hereditary nephritis of Alport's type, 
associated with mutations in genes encoding ECM proteins, defects in collagen assembly lead to 
progressive renal failure associated with splitting of the GBM and eventual glomerular and interstitial 
fibrosis. In contrast, in inflammatory renal diseases such as glomerulonephritis, cellular proliferation of 
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components of the glomerulus often precede obvious ultrastructural alteration of the ECM matrix. 
Cytokines and growth factors implicated in proliferative glomerulonephritis such as interleukin-1, tumor 
necrosis factor, and transforming growth factor beta can upregulate metalloproteinase expression in renal 
mesangial cells (Martin J. et al., J. Immunol., 1986;137:525-529; Marti H. P. et al., Biochem. J., 
5 1993;291:441-446; Marti H. P. et al., Am. J. Pathol., 1994;144:82-94). These metalloproteinases are 
believed to be intimately involved in the aberrant tissue remodeling and cell proliferation characteristic of 
renal diseases, such as, IgA nephropathy which can progress to through a process of gradual glomerular 
fibrosis and loss of functional GBM to end-stage renal disease. Metalloproteinase expression has already 
been well-characterized in experimental immune complex-mediated glomerulonephritis such as the anti- 

10 Thy 1.1 rat model (Bagchus W. M., et al., Lab. Invest, 1986;55:680-687; Lovett D. H., et al., Am. J. 
Pathol., 1992;141:85-98). 

In another embodiment, the protein of the invention can be used as a treatment or diagnostic tool 
for gingiva. Collagenase and stromelysin activities have been demonstrated in fibroblasts isolated from 
inflamed gingiva (Uitto V. J. 5 et al., J. Periodontal Res., 1981;16:417-424), and enzyme levels have been 

15 correlated to the severity of gum disease (Overall C. M., et al., J. Periodontal Res., 1987;22:81-88). 

In another embodiment, the protein of the invention is useful for treating or detecting ulcers. 
Proteolytic degradation of extracellular matrix has been observed in corneal ulceration following alkali 
burns (Brown S. I., et al., Arch. Opthalmol., 1969;81:370-373). Thiol-containing peptides inhibit the 
collagenase isolated from alkali-burned rabbit corneas (Burns F. R., et al., Invest. Opththamol., 

20 1989;30: 1569-1 575). Stromelysin, a member of the MMP family, is produced by basal keratinocytes in a 
variety of chronic ulcers (Saarialho-Kere U. K., et al., J. Clin. Invest., 1994;94:79-88). Stromelysin- 1 
mRNA and protein were detected in basal keratinocytes adjacent to but distal from the wound edge in 
what probably represents the sites of proliferating epidermis. Stromelysin- 1 may thus prevent the 
epidermis from healing. 

25 In another embodiment, the protein of the invention can be used as a treatment or diagnostic tool 

for tumor angiogenesis. Inhibitors of MMPs have shown activity in models of tumor angiogenesis 
(Taraboletti G., et al., Journal of the National Cancer Institute, 1995;87:293; and Benelli R., et al., 
Oncology Research, 1994;6:251-257). Davies et al., (Cancer Res., 1993;53:2087-2091) reported that a 
peptide decreased the tumor burden and prolonged the survival of mice bearing human ovarian carcinoma 

30 xenografts. A peptide of the conserved MMP propeptide sequence was a weak inhibitor of gelatinase A 
and inhibited human tumor cell invasion through a layer of reconstituted basement membrane (Melchiori 
A., et al., Cancer Res., 1992;52:2353-2356), and the natural tissue inhibitor of metalloproteinase-2 
(TIMP-2) also showed blockage of tumor cell invasion in in vitro models (DeClerck Y. A., et al., Cancer 
Res., 1992;52:701-708). Studies of human cancers have shown that gelatinase A is activated on the 
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invasive tumor cell surface (Strongin A. Y., et al, J. Biol. Chem, 1993;268:14033-14039) and is retained 
there through interaction with a receptor-like molecule (Monsky W. L, et al., Cancer Res., 1993;53:3159- 
3164). 

In another embodiment, the protein of the invention can be used to treat and diagnose rheumatoid 
5 arthritis. Collagenases have been implicated in a number of diseases, including, rheumatoid arthritis 
(Mullins, D. E. et al 1983), and it has been proposed to use MMP inhibitors in the treatment of this 
condition. Several investigators have demonstrated consistent elevation of stromelysin and collagenase in 
synovial fluids from rheumatoid and osteoarthritis patients as compared to controls (Walakovits L. A., et 
al. Arthritis Rheum, 1992;35:35-42; Zafarullah M, et al, J. Rheumatol., 1993;20:693-697). TIMP-1 

10 and TIMP-2 prevented the formation of collagen fragments, but not proteoglycan fragments, from the 
degradation of both the bovine nasal and pig articular cartilage models for arthritis, while a synthetic 
peptide hydroxamate could prevent the formation of both fragments (Andrews H. J., et al, Biochem. 
Biophys. Res. Commun, 1994;201:94-101). 

In another embodiment, the protein of the invention is used to treat or diagnose inflammation. 

15 Gijbels et al, (J. Clin. Invest, 1994;94:2177-2182) recently described a peptide that suppressed the 
development or reversed the clinical expression of experimental allergic encephalomyelitis (EAE) in a 
dose dependent manner, suggesting the use of MMP inhibitors in the treatment of autoimmune 
inflammatory disorders such as multiple sclerosis. A recent study by Madri has elucidated the role of 
gelatinase A in the extravasation of T-cells from the blood stream during inflammation (Ramanic A. M. 

20 and Madri J. A., J. Cell Biology, 1994;125:1165-1178). This transmigration past the endothelial cell 
layer is coordinated with the induction of gelatinase A and is mediated by binding to the vascular cell 
adhesion molecule-1 (VCAM-1). Once the barrier is compromised, edema and inflammation are 
produced in the CNS. Leukocytic migration across the blood-brain barrier is known to be associated with 
the inflammatory response in EAE. Inhibition of the metalloproteinase gelatinase A would block the 

25 degradation of extracellular matrix by activated T-cells that is necessary for CNS penetration. These 
studies provided the basis for the belief that an inhibitor of stromelysin- 1 and/or gelatinase A will treat 
diseases involving disruption of extracellular matrix resulting in inflammation due to lymphocytic 
infiltration, inappropriate migration of metastatic or activated cells, or loss of structural integrity 
necessary for organ function. 

30 The present invention provides the use of an MMP inhibitor in the manufacture of a medicament 

for the treatment or prophylaxis of scars. Collagen is the major component of scar and other contracted 
tissue and as such is the most important structural component to consider. Contraction of tissues 
comprising extracellular matrix components, especially of collagen-comprising tissues, may occur in 
connection with many different pathological conditions and with surgical or cosmetic procedures. 
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Contracture, for example, of scars, may cause physical problems, which may lead to the need for medical 
treatment, or it may cause problems of a purely cosmetic nature. 

During experiments on in vitro models of scar contraction, collagen appears to be invaded and 
permanently remodelled by fibroblasts and that such invasion and remodelling is inhibited by collagenase 
5 inhibitors. The remodelling generally appears as contraction of the collagen, the contraction of which is 
inhibited by inhibition of collagenase. Furthermore, inhibition of other MMPs also results in inhibition of 
contraction. 

The present invention also provides the use of an MMP inhibitor in the treatment or prophylaxis 
of a natural or artificial tissue comprising extracellular matrix components to inhibit, i.e. restrict, hinder or 

10 prevent, contraction of the tissue, especially contraction resulting from a pathological condition or from 
surgical or cosmetic treatment. 

Cosmetic treatments, such as chemical or physical dermal abrasion, used as anti-ageing 
treatments, cause trauma to the skin. Use of MMP inhibitors during the healing process which occurs after 
the initial abrasion is a cosmetic use of MMP inhibitors according to the present invention. 

15 The present invention also provides the use of an MMP inhibitor to inhibit, i.e. restrict, hinder or 

prevent, invasion by cells, especially fibroblasts, into tissue comprising an extracellular matrix and/or 
migration by cells, especially fibroblasts, in or through tissue comprising an extracellular matrix. 

In another embodiment, the present protein is used to prevent or reduce contracture of scar tissue 
resulting from eye surgery. Glaucoma surgery to create new drainage channels often fails due to scarring 

20 and contraction of tissues. A method of preventing contraction of scar tissue formed in the eye, such as 
the application of a suitable agent, is therefore invaluable. Such an agent may also be used in the control 
of the contraction of scar tissue formed after corneal trauma or corneal surgery, for example laser or 
surgical treatment for myopia or refractive error in which contraction of tissues may lead to inaccurate 
results. It is also useful in cases where scar tissue is formed on/in the vitreous humor or the retina, for 

25 example, that which eventually causes blindness in some diabetics and that which is formed after 
detachment surgery, called proliferative vitreoretinopathy. Other uses include where scar tissue is formed 
in the orbit or on eye and eyelid muscles after squint, orbital or eyelid surgery, or thyroid eye disease and 
where scarring of the conjunctiva occurs as may happen after glaucoma surgery or in cicatricial disease, 
inflammatory disease, for example, pemphigoid, or infective disease, for example, trachoma. A further 

30 eye problem associated with the contraction of collagen-comprising tissues for which the methods and 
medicaments of the present invention may be used is the opacification and contracture of the lens capsule 
after cataract extraction. 

In a preferred embodiment, the protein of the invention can be used for the treatment of burns. 
Contraction of collagen-comprising tissue, which may also comprise other extracellular matrix 
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components, frequently occurs in the healing of burns. The burns may be chemical, thermal or radiation 
burns and may be of the eye, the surface of the skin or the skin and the underlying tissues. It may also be 
the case that there are burns on internal tissues, for example, caused by radiation treatment. 

A further aspect of the present invention is the inhibition of the contraction of skin grafts. Skin 
5 grafts may be applied for a variety of reasons and may often undergo contraction after application. As 
with the healing of burnt tissues the contraction may lead to both physical and cosmetic problems. It is a 
particularly serious problem where many skin grafts are needed as, for example, in a serious burns case. 

An associated area in which the medicaments and methods of the present invention are of great 
use is in the production of artificial skin. To make a true artificial skin it is necessary to have an epidermis 

10 made of epithelial cells (keratinocytes) and a dermis made of collagen populated with fibroblasts. It is 
important to have both types of cells because they signal and stimulate each other using growth factors. A 
major problem up until now has been that the collagen component of the artificial skin often contracts to 
less than one tenth of its original area when populated by fibroblasts. MMP inhibitors, for example, 
collagenase inhibitors may be used to inhibit the contraction to such an extent that the artificial skin can 

15 be maintained at a practical size. 

Proteins/ SEQ ID NO:276 (157-15-4-0-B1 1-CS) 

The protein of SEQ ID NO:276, encoded by the cDNA of SEQ ID NO:35, is a variant of a testis- 
specific isoform of human calpastatin protein (Genseq accession number W19395). The protein of SEQ 
ID NO:276 contains 2 potential transmembrane segments (position 5 to 25 and position 109 to 129) 

20 predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. Notes, 10 :685-686 
(1994)), and a signal peptide (position 8 : LAVELTLLGLAIL/AI). Like the human calpastatin protein 
(Genseq accession number W19395), the protein of SEQ ID NO:276 is over-represented in testis. 

Calpastatin is a physiological inhibitor of calpains. Calpains, a group of ubiquitous Ca2+ - 
activated cytosolic proteases, are thought to participate in cytoskeletal remodeling events, cellular 

25 adhesion, shape change, and mobility by the site-specific regulatory proteolysis of membrane- and actin- 
associated cytoskeletal proteins (Beckerle et al., Cell 51:569-577, 1987; Yao et al., Am. J. Physiol. 
265(pt. l):C36-46, 1993; and Shuster et al., J. Cell Biol. 128:837-848, 1995). Calpains have also been 
implicated in the pathophysiology of cerebral and myocardial ischemia, platelet activation, NF-kB 
activation, Alzheimer's disease, muscular dystrophy, cataract progression and rheumatoid arthritis. There 

30 is considerable interest in inhibitors of calpain, as cellular adhesion, cytoskeletal remodeling events and 
cell mobility are linked to numerous pathologies (Wang et al., Trends in Pharm. Sci. 15:412-419, 1994; 
Mehdi, Trends in Biochem. Sci. 16:150-153, 1991). In addition, as the calpain/calpastatin system is 
involved in membrane fusion events for several cell types, and calpain can be detected in human sperm 
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and testes extracts by Western blotting with specific antisera, tCAST may modulate calpain in the 
calcium-mediated acrosome reaction that is required for fertilization (Li S et al., Biol Reprod, 63(1): 172- 
8, 2000). 

Calpastatin consists of a unique N-terminal domain (domain L) and four repetitive protease- 
5 inhibitor domains (domains 1-4) (Lee WJ et al., J Biol Chem, 267(12):8437-42 5 1992). The isolated 
cDNAs from various mammalian species have conspicuous differences in the regions encoding the N- 
terminal sequences and can be classified into four types. Alternative splicing is most likely the cause for 
the molecular diversity, and the multiple isoforms are implicated in specific physiological roles (Lee WJ 
et al., J Biol Chem, 267(12):8437-42, 1992). Type IV (or human tCAST), a shorter isoform, is 

10 specifically expressed in testis (Takano J et al., J Biochem Tokyo; 128(l):83-92, 2000). Human tCAST 
consists of a 40-amino-acid N-terminal T domain plus a part of domain II and all of domains III and IV 
from the somatic isoform. The protein of SEQ ID NO: 276 shows extensive homology to the N-terminal 
region of the testis basic specific protein (U60665) and the human calpastatin protein (W 19395). The 
homologous region corresponds to domain T and II of the human calpastatin protein (W19395). The T 

1 5 domain targets cytosolic localization and membrane association of tCAST, whereas domain I of somatic 
calpastatin proteins (sCAST) exhibits a nuclear localization function (Li S et al., Biol Reprod, 63(1): 172- 
8, 2000). 

It is believed that the protein of SEQ ID NO:276 is a member of the calpastatin family and, as 
such, plays a role in cytoskeletal remodeling events, cellular adhesion, shape change, and mobility by the • 

20 site-specific regulatory proteolysis of membrane- and actin-associated cytoskeletal proteins. Preferred 
polypeptides of the invention are polypeptides comprising the amino acids of SEQ ID NO:276 from 
positions 1 to 119. Other preferred polypeptides of the invention are any fragments of SEQ ID NO:276 
having any of the biological activities described herein. 

One embodiment of the present invention relates to methods of using the protein of the invention 

25 or part thereof in assays to detect the presence of calpain in a biological sample, such as in bodily fluids, 
in tissue samples, or in mammalian cell cultures. As calpastatin can bind calpain (Murachi, Biochemistry 
Int., 18(2)263-294, 1989), the protein of the invention can be used in assays and diagnostic kits to test the 
presence of calpain using techniques known to those skilled in the art. Preferably, a defined quantity of 
the protein of the invention or part thereof is added to the sample under conditions allowing the formation 

30 of a complex between the protein of the invention or part thereof, and the presence of the complex and/or 
the free protein of the invention or part thereof is assayed and compared to a control. Calpastatin has 
been shown to be useful as a marker of intracellular calpain activation, and can be used for monitoring the 
involvement of calpain in pathological situations (De Tullio et al., FEBS letter, 475(1): 17-21, 2000). 
Calpain has been implicated in cytoskeletal protein degradation involved in the pathophysiology of 
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ischemia and disorders like Alzheimer's disease (Wronski et al., J. Neural transm., 107(2): 145-157, 
2000), apoptosis in neural cells of rat with spinal cord injury (SCI) (Ray, Brain res., 867(l-2):80-9, 2000), 
cell fusibility (Kosower et al., Methods Mol Biol., 144:181-94, 2000) and other physiopathologies. 
Assays detecting any increased calpain level in a cell would thus allow the diagnosis of any of the herein- 
5 described diseases or conditions. In addition, a recent study showed that in addition to their proteolytic 
activities on cytoskeletal proteins and other cellular regulatory proteins, calpain-calpastatin systems can 
also affect expression levels of genes encoding structural or regulatory proteins (Chen et al, Am. J. 
Physiol. Cell Physiol, 279:C709-C716, 2000). Thus, the ability to detect calpastatin and calpain levels 
will likely be useful for the diagnosis of an even larger number of diseases and conditions. 

10 In another embodiment, the polynucleotides or polypeptides of the invention may be used for the 

detection of gametes, or of specific structures within the gametes, using any technique known to those 
skilled in the art, including those involving the use of specific antibodies and nucleic acid probes. 
Various studies have shown that calpastatin is present in the sperm acrosome (Li et al., Bio. of Reprod., 
63:172-178, 2000), and more precisely between the plasma membrane and outer acrosomal membrane of 

15 cynomolgus macaque spermatozoa (Yudin Al, J Androl, 2 1(5): 72 1-9, 2000). The ability to visualize 
spermatozoa generally, or the sperm acrosome in particular, has obvious utility for a number of 
applications, including for the analysis of infertility in patients, as described below. 

Another embodiment of the present invention relates to a method of inhibiting a calpain in a cell. 
Various studies have shown that it is possible to inhibit calpains dose dependently in cell free protease 

20 activity assays: the calpain inhibitor Cerebrolysin can protect microtubule associated protein 2 (MAP2) in 
a rat model of acute brain ischemia (Wronski et al., J, Neural Transm. Suppl., 59:263-272, 2000), and E- 
64-D, a cell permeable and selective inhibitor of calpain, can attenuate calpain activity associated with 
apoptosis in rat SCI (Ray et al., Brain Res., 867(1-2)80-9, 2000). Similarly, it is believed that the protein 
of SEQ ID NO:276 can be used to inhibit calpain in vitro or in vivo. As calpain has been implicated in a 

25 number of pathological processes, diseases, and conditions, such as the pathophysiology of cerebral and 
myocardial ischemia, platelet activation, NF-kB activation, Alzheimer's disease, muscular dystrophy, 
cataract progression and rheumatoid arthritis, any of these diseases or conditions can be treated or 
prevented by increasing or decreasing the activity or expression of the present protein in cells of a 
mammal affected by the disease or condition. Such an increase can be effected in any of a number of 

30 ways, including, but not limited to introducing a polynucleotide encoding the protein of the invention, 
operably linked to a promoter, into a cell ; and by administering to a cell a compound that increases the 
activity or expression of the protein of the invention. In addition, the expression or activation of the 
protein of the invention can be inhibited in any of a large number of ways, including using antisense 
oligonucleotides, antibodies, dominant negative forms of the protein, and using heterologous compounds 
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that decrease the expression or activation of the protein. Such compounds can be readily identified, e.g. 
by screening candidate compounds and detecting the level of expression or activity of the protein using 
any standard assay. 

In another preferred embodiment, the protein of the invention can be used to modulate and/or characterize 
5 fertility, including for the treatment or diagnosis of infertility, and for contraception. As the 
calpain/calpastatin system has been implicated in the acrosomal reaction which is a required step in 
fertilization, it is likely that the over- or under-expression or activation of the present protein disrupts this 
reaction, thereby inhibiting fertility. Thus, the cause of infertility in many patients can likely be detected 
by detecting the level of expression of the present protein, where an abnormal level of activity or 

10 expression of the protein indicates that a cause of infertility involves the calpain-dependent acrosomal 
reaction. Such a diagnosis would also point to methods of treating the infertility, e.g. by increasing or 
decreasing the expression or activation of the protein in spermatozoa. Alternatively, for contraception, 
the expression or activation of the protein can be artificially disrupted, for example by increasing the 
protein level using polynucleotides encoding the protein, using the protein itself, or using activators of 

15 protein expression or activity, or by decreasing the protein level using inhibitors such as antisense 
oligonucleotides, antibodies, dominant negative forms of the protein, and using heterologous compounds 
that inhibit protein expression or activity. 

Protein of SEP ID NO: 295 (internal designation 181-20-3-0-B5-CS) 

The protein of SEQ ID NO: 295, encoded by the cDNA of SEQ ID NO:54, shows homology to the 
20 rat, bovine, and human uromodulin precursor, Tamm-Horsfall urinary glycoprotein, and thuman pancreatic 
secretory granule membrane major glycoprotein GP2 precursor. SEQ ID NO:295 exhibits homology in the 
5' region (over 40% identical and 60% similar) to both GP2 and uromodulin. Like GP2 and uromodulin, the 
homologous segment contains EGF-like calcium-binding domains, several potential disulfide bonds, and a 
number of potential N-linked glycosylation sites. Calcium binding EGF-like domains contain a calcium- 
25 binding site at the N-terminus, and have been found in proteins which require calcium for their biological 
activity. Non-limiting examples of proteins which contain calcium-binding EGF-like domains include: (1) 
Coagulation Factors X, VII, IX; (2) LDL receptors; (3) thrombomodulin; and (4) fibrillin- 1. Downing et al 
[Cell 85:597-605 (1996)] described disease-causing mutations that destabilized a covalently-linked pair of 
Ca 2+ -binding EGF-like domains in fibrillin- 1 (associated with Marfan Syndrome). These domains form a 
30 rigid rod-like arrangement, stabilized by interdomain calcium binding and hydrophobic interaction. 
Uromodulin (URO) is a 90-100 KDa glycoprotein synthesized by epithelial cells of the ascending loops of 
Henle and convoluted tubule of the bladder. Except for glycosylation, URO is identical to Tamm-Horsfall 
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protein (THP), the most abundant protein in normal human urine. The relative abundance and specific 
nephronal location of URO suggests that it may have important physiologic functions in the urinary system. 

URO has also been found to be an immunosuppressive glycoprotein, inhibiting antigen-induced 
human T-cell proliferation. More recent studies have shown that URO can trigger the inflammatory 
5 response of neutrophils and stimulate human mononuclear cells to proliferate and release cytokines and 
gelatinase. 

Uromodulin has been shown to play a role in regulating the circulating activity of cytokines since 
it binds to recombinant interleukin -1 and -2 and tumor necrosis factor (TNF) with high affinity. Although 
URO does not inhibit the cytotoxic activity of TNFa as monitored by lysis of tumor cell targets, it 

10 interacts with recombinant TNFa via carbohydrate chains. This interaction may be critical in promoting 
clearance and/or reducing in vivo toxicity of TNFa and other lymphokines. Endotoxic shock and sepsis 
are caused by cytokines IL-1 and TNFa. Since URO appears to exhibit inhibitory activity against IL-1 
and TNFa, URO may be effective as a therapeutic agent against these conditions. Uromodulin has also 
been implicated as a possible inhibitor of certain types of bacterial infection in the bladder and urinary 

15 tract. URO has the ability to bind to type 1 pilus of Escherichia coli and prevent attachment to the 
surface of epithelium. 

SEQ ID NO:295 also has homology to the glycoprotein GP-2. GP-2 is an integral protein of the 
pancreatic zymogen granule membrane. GP2 is anchored to the lipid bilayer via a glycosyl 
phosphatidylinositol (GPI) linkage and released by a calcium-activated enzyme into the content of the 

20 zymogen granule. Through the process of exocytosis, GP2 is discharged into the pancreatic duct. The 
protein is also soluble in the zymogens stored in the granule, secreted by the pancreas, and detected in the 
pancreatic secretions. GP2 appears to play a role in progression of pancreatitis, an inflammation of the 
pancreas accompanied by autodigestion of pancreatic tissue by its own enzymes. After cloning and 
sequencing of GP2, a search of the Genbank database revealed one homologous protein, namely 

25 uromodulin. Studies reveal that GP2 and URO not only share structural homology, but functionally are 
similar in that both can form ductal precipitates under pathological conditions. The aggregation of these 
precipitates in the pancreas may lead to obstruction of the pancreatic ducts and play a critical role in 
development of pancreatitis. Similarly, aggregation of URO in the kidney may lead to blockage of the 
renal tubules and result in renal disease. 

30 The subject invention provides the protein of SEQ ID NO:295 and polynucleotide sequences 

encoding SEQ ID NO:295. Also included in the invention are biologically active fragments of the protein 
encoded by SEQ ID NO:295 and polynucleotide sequences encoding these biologically active fragments. 
"Biologically active fragments" are defined as those peptide or polypeptide fragments of SEQ ID NO:295 
which have at least one of the biological functions of the full length protein (e.g., the ability to chelate 
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calcium, bind to E. coli pili, or cause immunomodulation of an individual). In one embodiment, the 
polypeptides of SEQ ID NO:295 are interchanged with the polypeptides encoded by the human cDNA of 
clone 181-20-3-0-B5-CS. 

The invention also provides variants of SEQ ID NO:295. These variants have at least about 80%, 
5 more preferably at least about 90%, and most preferably at least about 95% amino acid sequence identity 
to the amino acid sequence of SEQ ID NO:295. Variants according to the subject invention also have at 
least one functional or structural characteristic of SEQ ID NO:295, such as the biological functions 
described above or EGF-like calcium-binding domains. The invention also provides biologically active 
fragments of the variant proteins. Unless otherwise indicated, the methods disclosed herein can be 
10 practiced utilizing SEQ ID NO:295, or variants thereof. Likewise, the methods of the subject invention 
can be practiced using biological fragments of SEQ ID NO:295 , or variants of said biologically active 
fragments. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 
SEQ ID NO:295. It is well within the skill of a person trained in the art to create these alternative DNA 

15 sequences which encode proteins having the same, or essentially the same, amino acid sequence. These 
variant DNA sequences are, thus, within the scope of the subject invention. As used herein, reference to 
"essentially the same sequence" refers to sequences that have amino acid substitutions, deletions, 
additions, or insertions that do not materially affect biological activity. 

"Recombinant nucleotide variants" are alternate polynucleotides which encode a particular 

20 protein. They can be synthesized, for example, by making use of the "redundancy" in the genetic code. 
Various codon substitutions, such as the silent changes which produce specific restriction sites or codon 
usage-specific mutations, can be introduced to optimize cloning into a plasmid or viral vector or 
expression in a particular prokaryotic or eukaryotic host system, respectively. 

SEQ ID NO:295, and variants thereof, can be used to produce antibodies according to methods 

25 well known in the art. The antibodies can be monoclonal or polyclonal. Antibodies can also be 
synthesized against fragments SEQ ID NO:295 as well as variants of SEQ ED NO:295 according to 
known methods. The subject invention also provides antibodies which specifically bind to biologically 
active fragments of SEQ ID NO:295 or biologically active fragments of variants of SEQ ID NO:295. 

The subject invention also provides for immunoassays which are used to screen for, monitor, or 

30 diagnose conditions or disorders associated with liver dysfunction and/or damage. These conditions or 
disorders include, and are not limited to, hepatitis, cirrhosis, fibrosis, pericholangitis, portal triaditus, 
chronic periportal inflammation, systemic lupus erythematosus, Hodgkin's disease, Granulomas, and cell 
dysplasia can also be diagnosed. For a number of disorders listed above, expression of these genes at 
significantly higher or lower levels can be routinely detected in certain liver tissues or cell types (e.g., 
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cancerous) or bodily fluids (e.g., serum, plasma, and blood) taken from an individual having such a 
disorder, relative to the standard gene expression levels, e.g., the expression level in healthy tissue or 
bodily fluid from one or more individuals not having the disorder. These types of assays allow for a non- 
invasive method of screening for, diagnosing, or monitoring liver cancer in human subjects. Similarly, 
5 antibodies and small molecules directed to the polypeptides can be used as immunological probes for 
differential identification of the diseased tissue(s) or cells. 

Additionally, nucleic acid and amino acid sequences of SEQ ID NOs:54 and 295 can be used to 
provide polypeptides and biologically active fragments thereof for the repair of cellular injury following 
liver damage and/or liver transplant. 

10 Furthermore, polypeptides, or biologically active fragments thereof, can be used for the 

modulation of bacterial binding to epithelial cells or as a modulator of bacterial infection. In this aspect 
of the subject invention, bacterial cells are contacted with an amount of a composition comprising the 
polypeptide, or biologically active fragments thereof, sufficient to interfere with the binding of bacteria to 
epithelial cells. In one embodiment, the bacteria are coliform bacterial cells. In another embodiment, the 

15 bacterial cells are E. coli. Compositions comprising SEQ ID NO:295, or biologically active fragments 
thereof, can be administered in any fashion required to provide a therapeutic effect (e.g., orally, 
intravenously, intrathecally, intraarterially, etc.). 

The subject invention also provides materials and methods for the treatment of endotoxic shock 
and/or sepsis. In this embodiment, a subject can be treated with therapeutically effective amounts of a 

20 composition comprising SEQ ID NO:295, or biologically active fragments thereof. 

The subject invention also provides materials and methods for the in vivo or in vitro chelation of 
calcium ions (Ca 2+ ). In this aspect of the invention, SEQ ED NO: , or biologically active fragments 
thereof, can be used to bind free Ca 2+ by addition of the polypeptide, or biologically active fragments 
thereof, to solutions, environmental samples, or biological samples. Alternatively, a composition 

25 containing the SEQ ID NO:295, or biologically active fragments thereof, can be added to the solutions, 
environmental samples, or biological samples in amounts sufficient to bind and remove free Ca 2+ from 
solution. 

In another aspect of the subject invention, SEQ ID NO:295, or biologically active fragments 
thereof, can be used to modulate the immune system of a mammal. In this method, immunomodulatory 
30 amounts of SEQ ID NO:295, or biologically active fragments thereof, can be administered to a mammal 
in a pharmaceutical^ acceptable carrier. Methods of assessing the stimulated state of the immune system 
of the mammal can be practiced according to methods well known in the art. 
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Protein of SEP ID NOs:244, 251 (internal designation numbers 105-016-3-0-G10-CS and 105-074-3-0- 
H10-CS) 

The 274 amino acid protein of SEQ ID NO:244, encoded by the cDNA of SEQ ID NO:3, found 
in prostate and strongly expressed in the salivary gland, presents strong sequence similarities with the 
5 yeast putative mitochondrial carrier protein PET8 (SWISSPROT accession number P38921) and with 
similar proteins conserved among eukaryotes (D. melanogaster and C. elegans: respective 
SPTREMBLNEW SPTREMBL SWISSPROT accession numbers Q9VBN7 and Q18934, and S. pombe: 
SWISSPROT accession number: Q 10442). All members of the mitochondrial carrier/transport protein 
superfamily exhibit sequence motifs highly similar to P-X-D/E-X-X-K/ROC-R that are also found in 3 

10 positions in the protein of the invention (positions 26 to 33, 108 to 115 and 199 to 206) (Belenkiy et al, 
Biochim. Biophys. Acta, 1467:207-218 (2000)). These mitochondrial carrier protein signatures are 
associated with membrane-spanning segments (Belenkiy et al, ibid; Kuan et Saier, Crit. Rev. Biochem. 
MoL Biol, 28:209-233 (1993)). In fact, 4 candidate membrane-spanning segments are identified in the 
protein of the invention, from amino acid positions 4 to 24, 51 to 71, 180 to 200 and 240 to 260. Other 

15 hydrophobic regions are found in positions 86 to 107 and 139 to 162. In addition, the protein of SEQ ID 
NO:244 presents a putative signal peptide in its very amino-terminal part (position 5 to 19). 

The protein of SEQ ED NO:251, encoded by the cDNA of SEQ ID NO: 10, is a 72 amino acid 
truncated form of the protein of SEQ ID NO:244. This shorter product results from the absence, in the 
cDNA of SEQ ID NO:10, of the HObp exon (position 275 to 384) found in the cDNA of SEQ ID NO:3. 

20 Nevertheless, the 72 amino acid encoded protein possesses the putative signal peptide (position 5 to 19), 
the first mitochondrial carrier protein signature (position 26 to 33), and two candidate membrane- 
spanning segments (positions 4 to 24 and 51 to 71). 

Energy transduction in mitochondria requires the transport of many specific metabolites across 
the inner membrane of this eukaryotic organelle. Different types of substrate carrier proteins involved in 

25 energy transfer are found in the inner membrane. These proteins all seem to be evolutionary related, and 
constitute the mitochondrial carrier/transport proteins (MCP/MTP) superfamily. Structurally, MCP/MTP 
proteins are typically homodimeric integral transmembrane polypeptides (subunit molecular weight 
~30kD) that traverse the inner mitochondrial membrane six times with both the N- and C-termini 
localized to the cytosolic side of the membrane. Each 30kD subunit is composed of three tandem repeats 

30 of a domain of approximately one hundred residues (~40kD). This lOkD domain contains two 
transmembrane regions and a sequence motif highly similar to P-Xi-D/E-X 2 -X 3 -K/R-X 4 -R, where X 3 is a 
hydrophobic residue (Kuan et Saier, Crit. Rev. Biochem. Mol. Biol., 28:209-233 (1993)). Five protein 
families of known function have been identified among the mitochondrial carrier protein superfamily: 
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(1) The ADP, ATP carrier protein (ACC), ADP/ATP translocases, which under the 
conditions of oxidative phosphorylation catalyze the one to one exchange of cytosolic ADP against matrix 
ATP across the inner mitochondrial membrane (Fiore et al, Biochimie, 80:137-150 (1998)). The 
ADP/ATP transport system can be blocked very specifically by two families of inhibitors: atractyloside 

5 (ATR) and carboxyatractyloside (CATR) on one hand, and bongkrekic acid (BA) and isobongkrekic acid 
(isoBA) on the other hand. It is well established that these inhibitors recognise two different 
conformations of the carrier protein, the CATR- and BA-conformations, which exhibit different chemical, 
immunochemical and enzymatic reactivities. Bakker and collaborators have reported that myopathies 
might result from a defect in ADP/ATP transport (Bakker et al, Pediatr. Res. 33:412-417 (1993)). 
10 Namely, the authors describe a 4-fold decrease in the concentration of the ADP/ATP carrier protein in a 
patient with a mitochondrial myopathy. 

(2) The 2-oxoglutarate/malate carrier protein (OGCP), which exports 2-oxoglutarate into 
the cytosol and imports malate, or other dicarboxylic acids, into the mitochondrial matrix. This protein 
plays an important role in several metabolic processes, such as the malate/aspartate and the 

15 oxoglutarate/isocitrate shuttles (Palmieri et al, J. Bioenerg. Biomembr. 25:493-501 (1993)). 

(3) The phosphate group carrier protein, which transports inorganic phosphate groups 
from the cytosol into the mitochondrial matrix (Palmieri et al, ibid). 

(4) The mammalian brown fat uncoupling proteins, such as UCP-1 (thermogenin), are 
transmembrane proton-translocating proteins present in the mitochondria of brown adipose tissue, a 

20 specialized tissue which functions in heat generation and energy balance ((Jezek and Garlid, Int. J. 
Biochem. Cell. Biol. 30:1163-1168 (1998); Klingenberg, J. Bioenerg. Biomembr. 31:419-430 (1999); 
Nicolls and Locke, Physiol. Rev. 64:2-40 (1994); Rothwell and Stock, Nature 281:31-35 (1979)). 
Mitochondrial oxidation of substrates is accompanied by proton transport out of the mitochondrial matrix, 
creating a transmembrane proton gradient. Typically, re-entry of protons into the matrix via ATP 

25 synthase is coupled to ATP synthesis. However, UCP-1 functions as a transmembrane proton transporter, 
permitting re-entry of protons into the mitochondrial matrix unaccompanied by ATP synthesis. 
Environmental exposure to cold evokes neural and hormonal stimulation of brown adipose tissue, which 
increases UCP mediated proton transport, brown fat metabolic activity, and heat production. 

Studies with transgenic models indicate that brown fat and UCP-1 play an important role in 

30 energy expenditure in rodents. Transgenic mice in which brown adipocyte tissue was ablated by a toxin 
coupled to the UCP-promoter developed obesity and diabetes (Lowell et al., Nature 366:740-742 (1993)). 
Obesity in these transgenic animals developed in the absence of hyperphagia, suggesting that the 
uncoupled mitochondrial respiration of brown fat is an important component of energy expenditure. In a 
separate transgenic mouse model, ectopic expression of UCP-1 in white adipose tissue of genetically- 
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obese mice led to a significant reduction in body weight and fat stores (Kopecky et al y J. Clin. Invest. 
96:2914-2923 (1995)). These studies indicate that activity of UCP-1 is accompanied by energy 
expenditure and weight loss in rodents. Two other UCP proteins have recently been cloned. The first 
uncoupling protein-like protein (UCPL) or UCP-2 (59% homologous), is widely expressed (heart, kidney, 
5 lung, placenta and white fat) and enriched in tissues of the lymphoid lineage (Fleury et ai, Nature 
Genetics 15:269-272 (1997)). The second, UCP-3 3 (57% homologous), is predominantly localized to 
skeletal muscle and brown fat (Boss et a/., FEBS Lett. 408:39-42 (1997)). UCP-3 has been found to be 
regulated by cold and thyroid hormone (Larkins et ai, Biochem. Biophys. Res. Comm. 240:222-227 
(1997)). 

10 Thermogenic protein activity, such as that found with UCP-1, may be useful in reducing, or 

preventing the development of, excess adipose tissue, such as that found in obesity. Obesity is becoming 
increasingly prevalent in developed societies. Attempts to reduce food intake, or to decrease 
hypernutrition, are usually fruitless in the medium term because the weight loss induced by dieting results 
in both increased appetite and decreased energy expenditure (Leibel et ai, New Engl. J. Med. 322:621- 

15 628 (1995)). The intensity of physical exercise required to expend enough energy to materially lose 
adipose mass is too great for many obese people to undertake on a sufficiently frequent basis. Thus, 
obesity is currently a poorly treatable, chronic, essentially intractable metabolic disorder. In addition, 
obesity carries a serious risk of co-morbities including, Type 2 diabetes, increased cardiac risk, 
hypertension, atherosclerosis, degenerative arthritis, and increased incidence of complications of surgery 

20 involving general anesthesia. 

(5) The tricarboxylate transport protein (or citrate transport protein), which is involved in 
citrate-H+/malate exchange. This protein is important for the bioenergetics of hepatic cells as it provides 
a carbon source for fatty acid and sterol biosyntheses, and NAD for the glycolytic pathway (Kaplan et al, 
J. Biol. Chem. 268:13682-13690 (1993)). 

25 It is believed that the protein of SEQ ID NO:244 or part thereof is a member of the mitochondrial 

carrier/transport protein superfamily and, as such, plays a role in mitochondrial processes such as 
ADP/ATP, malate/aspartate, 2-oxoglutarate/isocitrate, citrate-H+/malate exchanges across the inner 
membrane, phosphate groups transport and physiological roles such as regulation of body weight and 
energy balance, muscle nonshivering thermogenesis, fever, and defense against the generation of reactive 

30 oxygen species. Preferred polypeptides of the invention are polypeptides comprising the amino acids of 
SEQ ID NO:244 from amino acid positions 26 to 33, 108 to 1 15 and 199 to 206 on one hand, and from 
positions 4 to 24, 51 to 71, 86 to 107, 139 to 162, 180 to 200 and 240 to 260 on the other hand. Other 
preferred polypeptides of the invention are fragments of SEQ ID NO:244 having any of the biological 
activities described herein. It is believed that the protein of SEQ ID NO:25 1 is a 72 amino acid truncated 
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form of the 274 amino acid protein of SEQ ED NO:244, and corresponds to one subunit of the tripartite 
structure of mitochondrial carrier/transport proteins. Preferred polypeptides are polypeptides comprising 
the amino acids of SEQ ID NO:251 from positions 4 to 24, 26 to 33 and 51 to 71. 

The activity of the protein of the invention can be assessed using cultured cells. For example, 
5 nucleic acids encoding the protein of SEQ ID NO:244 can be cloned into a eukaryotic vector and 
transfected into a population of cells. Transfected mammalian cells are then tested for their carrier 
activity e.g., the import of ADP, dicarboxylic acids, inorganic phosphate groups, or H + into the 
mitochondrial matrix, and the export of ATP, 2-oxoglutarate, tricarboxylate-H + export into the cytosol. 
These transfected cell lines may allow the development of in vitro assays for the identification of 

10 modulators of the carrier activity, such as atractyloside (ATR), carboxyatractyloside (CATR), bongkrekic 
acid (BA) and isobongkrekic acid (isoBA), which were described above in connection with the ADP/ATP 
mitochondrial carrier. Such modulators are useful for the treatment of any diseases or conditions 
associated with the protein of the invention. 

Another embodiment of the invention relates to compositions and methods using the 

15 protein of the invention or part thereof to label mitochondria, or more specifically the inner 
mitochondrial membrane, in order to visualize any change in number, topology or morphology 
of this organelle, for example in association with a mitochondria-related human disorder, such as 
neuroleptic malignant syndrome (NMS) (Kubo et al., Forensic Sci. Int. 115:155-158 (2001)), the 
Rett syndrome (Armstrong, Brain Dev. 14 Suppl:S89-98 (1992)), Alpers disease (Chow and 

20 Thorburn, Hum. Reprod. 15 Suppl 2:68-78 (2000)) or mitochondrial encephalomyopathies 
(Handran et al., Neurobiol. Dis. 3:287-298 (1997)). For example, the protein may be rendered 
easily detectable by inserting the cDNA encoding the protein of the invention into a eukaryotic 
expression vector in frame with a sequence encoding a tag sequence. Eukaryotic cells 
expressing the tagged protein of the invention may also be used for the in vitro screening of 

25 drugs or genes capable of treating any mitochondria-related disease or conditions. The protein of 
the invention can also be used to specifically label cells of the salivary gland or of the prostate, 
e.g. for histological analyses or for the identification of the origin of tumor cells. 

The protein of the invention can also be used as a carrier/transporter to translocate 
radiolabeled or chemically labeled metabolites (ADP, dicarboxylic acids, inorganic phosphate 

30 groups) from the cytosol to the matrix of the mitochondria in order to specifically label this 
organelle, e.g. to follow its modifications. For example, radiolabeled or chemically labeled 
precursors can be added to an in vitro culture of mammalians cells stably transfected and 
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expressing the protein of the invention. The labeling of the organelles can then be stopped at 
different times after the beginning of the experiment by adding specific inhibitors of 
carrier/transporter proteins, such as atractyloside (ATR), carboxyatractyloside (CATR), 
bongkrekic acid (BA), or isobongkrekic acid (isoBA). Cells with labeled mitochondria can be 
5 used for the in vitro screening of drugs or genes capable of causing mitochondrial modifications. 

Still another embodiment of the invention or part thereof relates to methods of delivering 
heterologous compounds, either polypeptides or polynucleotides, to the inner membrane of 
mitochondria by recombinantly or chemically fusing a fragment of the protein of the invention to 
a heterologous polypeptide or polynucleotide. Preferred fragments are the putative peptide 

10 signal, the four membrane-spanning segments and/or any other fragments of the protein of the 
invention that may contain targeting signals for mitochondria including but not limited to matrix 
targeting signals as defined in Herrman and Neupert, Curr. Opinion Microbiol. 3:210-4 (2000); 
Bhagwat et al. J. Biol. Chem. 274:24014-22 (1999), Murphy Trends Biotechnol. 15:326-30 
(1997); Glaser et al. Plant Mol Biol 38:311-38 (1998); Ciminale et al. Oncogene 18:4505-14 

15 (1999). Such heterologous compounds may be used to modulate mitochondrial activities, such 
as to induce and/or prevent mitochondrial-induced apoptosis or necrosis. For example, these 
heterologous compounds may be used in the treatment and/or the prevention of disorders in 
which apoptosis is deleterious, including, but not limited to, immune deficiency syndromes 
(including AIDS), type I diabetes, pathogenic infections, cardiovascular and neurological injury, 

20 alopecia, aging, degenerative diseases such as Alzheimer's Disease, Parkinson's Disease, 
Huntington's disease, dystonia, Leber's hereditary optic neuropathy, schizophrenia, and 
myodegenerative disorders such as "mitochondrial encephalopathy, lactic acidosis, and stroke" 
(MELAS), and "myoclonic epilepsy ragged red fiber syndrome" (MERRF). In addition, 
heterologous polynucleotides may be used to deliver nucleic acids for mitochondrial gene 

25 therapy, i.e. to replace a defective mitochondrial gene and/or to inhibit the deleterious expression 
of a mitochondrial gene. 

The invention further relates to methods and compositions used to modify the protein of the 
invention. Post-translational modifications encompassed by the invention include, N-linked or O-linked 
carbohydrate chains, processing of N-terminal or C-terminal ends, attachment of chemical moieties to the 

30 amino acid backbone, chemical modifications of N-linked or O-linked carbohydrate chains, and addition 
or deletion of an N-terminal methionine residue as a result of prokaryotic host cell expression. These 
post-translational modifications of the protein of the invention may be very useful in the search for its 
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putative protein partners, using approaches such as screening of an expression cDNA library with a 
radiolabeled recombinant protein, as post-translational modifications are of first importance in protein- 
protein interactions. Identification of proteinic partners of mitochondrial carrier proteins would allow the 
study of their regulation ex vivo and in vivo in normal versus pathologic cases (for an example concerning 
5 the UCP1 mitochondrial carrier protein and its 14.3.3 physical partner, see: Pierrat et al., Eur. J. Biochem. 
267:2680-2687 (2000)). 

Another embodiment of the invention relates to composition and methods using polynucleotide 
sequences encoding the protein of the invention or part thereof to establish transgenic model animals (D. 
melanogaster, M. musculus), by any method familiar to those skilled in the art. By modulating in vivo the 
10 expression of the transgene with drugs or modifier genes (activator or suppressor genes), animal models 
can be developed that mimic human mitochondria-associated disorders such as myopathies or obesity. 
These animal models would thus allow the identification of potential therapeutic agents for treatment of 
the disorders. In addition, recombinant cell lines derived from these transgenic animals may be used for 
similar approaches ex vivo. 

15 In one embodiment, the protein of SEQ ID NO:251, corresponding to the 72 amino acid truncated 

form of SEQ ID NO:244, may be used as a dominant negative variant to inhibit the function of the full- 
length form of the protein of SEQ ID NO: 244 in vitro or in vivo. Inactivation of mitochondrial carriers in 
this way may allow the development of animal models for human disorders. Recently, for example, 
Lowell and collaborators have shown in the mouse that a targeted destruction of UCP1 by the diphteria 

20 toxin A chain is able to produce obese animals (Kozak and Koza, ibid., Lowell et al., ibid.). 

Protein of SEP ID NO: 285 ( internal designation 1 74-39-2 -0-A3-CS) 

The protein of SEQ ID NO:285, encoded by the cDNA of SEQ ED NO: 44 (clone 1 74-39-2 -0-A3- 
CS), is overexpressed in cancerous prostate, fetal brain, muscle and placenta. The protein is homologous 
to the NADH-cytochrome b5 reductase isoform and to the human electron transport protein. 

25 NADH-cytochrome b5 reductase proteins belong to a flavoenzyme family sharing common 

structural features and whose members (ferrodoxin-NADP+ reductase, NADPH-cytochrome P450 
reductase, NADPH-sulfite reductase, NADH-cytochrome b5 reductase and NADH-nitrate reductase) are 
involved in photosynthesis, in the assimilation of nitrogen and sulfur, in fatty-acid oxidation, in the 
reduction of methemoglobin and in the metabolism of many pesticides, drugs and carcinogens (Karplus et 

30 al., Science, 251:60-6 (1991)). In addition, cytochrome b5 reductase is thought to play a role in the 
prevention of apoptosis following oxidative stress (see review by Villalba et al., Mol Aspects Med 18 
Suppll:S7-13 (1997)). 
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It is believed that the protein of SEQ ID NO: 285 may be an oxidoreductase. Thus it may play a role 
in electron transport and general aerobic metabolism and may be associated with mitochondrial membranes. 
In addition, the protein of the invention may be able to use FAD and/or molybdopterin as cofactors. It may be 
involved in photosynthesis, in the assimilation of nitrogen and sulfur, in fatty-acid oxidation, in the reduction 
5 of methemoglobin and in the metabolism of many pesticides, drugs and carcinogens. Preferred polypeptides 
of the invention are fragments of SEQ ID NO: 285 having any of the biological activity described herein. 
The oxidoreductase activity of the protein of the invention may be assayed using any technique known to 
those skilled in the art. The ability to bind a cofactor may also be assayed using any techniques well known 
to those skilled in the art including, for example, the assay for binding NAD described in US patent 
10 5,986,172. 

In another embodiment, the protein of the invention or part thereof is used to prevent cells from 
undergoing apoptosis. In a preferred embodiment, the apoptosis active polypeptide is added to an in vitro 
culture of mammalian cells in an amount effective to reduce apoptosis. Furthermore, the protein of the 
invention or part thereof may be useful in the diagnosis, the treatment and/or the prevention of disorders 

15 in which apoptosis is deleterious, including but not limited to immune deficiency syndromes (including 
AIDS), type I diabetes, pathogenic infections, cardiovascular and neurological injury, alopecia, aging, 
degenerative diseases such as Alzheimer's Disease, Parkinson's Disease, Huntington's disease, dystonia, 
Leber's hereditary optic neuropathy, schizophrenia, and myodegenerative disorders such as 
"mitochondrial encephalopathy, lactic acidosis, and stroke" (MELAS), and "myoclonic epilepsy ragged 

20 red fiber syndrome" (MERRF). 

The invention further relates to methods and compositions using the protein of the invention or 
part thereof to diagnose, prevent and/or treat several disorders in which energy metabolism is impaired, or 
needs to be impaired, including but not limited to mitochondriocytopathies, necrosis, aging, 
neurodegenerative diseases, myopathies, methemoglobinemia, hyperlipidemia, obesity, cardiovascular 

25 disorders and cancer. For diagnostic purposes, the expression of the protein of the invention could be 
investigated using any of the Northern blotting, RT-PCR or immunoblotting methods described herein 
and compared to the expression in control individuals. For prevention and/or treatment purposes, the 
protein of the invention may be used to enhance electron transport and increase energy delivery using any 
of the gene therapy methods described herein. 

30 Protein of SEP IDNO:368 (internal designation 1 87-45-0-0-1 18-CS) 

The protein of SEQ ID NO: 368 encoded by the cDNA of SEQ ID NO: 127 is a 78 amino acids 
long polypeptide. The sequence of the protein of SEQ ID NO: 368 is identical to the sequence of the 
human Dadl protein, the defender against apoptotic cell death 1 protein, a subunit of the mammalian 
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oligosaccharyltransferase (OST), except that the last 43 residues of Dadl are replaced by a series of 8 
different amino acids in the protein of the invention. In addition, the protein of SEQ ID NO: 368 displays 
the pfam signature for DAD family proteins from positions 1 to 78 as well as two putative transmembrane 
domains from positions 31 to 51 and 54 to 74. The Dadl protein is a 113 amino acids long protein which 
5 mRNA is composed of 3 exons [see Genbank accession number D15057 and Nakashima, T. et al (1993) 
Molecular and Cellular Biology 13:6367-6374]. The cDNA of SEQ ID NO: 127 is composed of the first 
and third exon of the Dadl cDNA wheras the second exon of the Dadl cDNA is missing. Taken together, 
these datas indicate that the protein of SEQ ID NO: 127 is a new isoform of the Dadlprotein resulting 
from an alternative splicing event. 

10 Asparagine-linked glycosylation is a highly conserved protein modification reaction that occurs in 

all eukaryotes. The initial stage in the biosynthesis of N-linked glycoproteins, catalysed by the enzyme 
oligosaccharyltransferase (OST), involves the transfer of a preassembled high-mannose oligosaccharide 
from a dolichol-linked oligosaccharide donnor onto asparagine acceptor sites in nascent proteins in the 
lumen of the rough endoplasmic reticulum [ for review, see Silberstein, S. et al (1996) FASEB J 10: 849- 

15 858]. 

Protein glycosylation is essential for the structure and function of many proteins and is involved 
in the control of many diverse biological processes (Paulson, Trends in Biol. Sci., 1989, 14, 272; Sadler, 
In Biology of Carbohydrates, 2nd Ed., Ginsburg & Robbins, Ed., John Wiley & Sons: New York, 1984, 
Vol. 2, pg. 87). For example, protein glycosylation has been found to be crucial for the development, 

20 growth and proper function of complex organisms, while the aberrant glycosylation of proteins has been 
associated with diseased and transformed cells. 

The mammalian oligosaccharyltransferase is composed of the four ER membrane proteins, 
ribophorin I and II (RI and RII), OST48, and DAD1, which form an oligomeric complex. RI and OST48, 
and probably also RII, are type I transmembrane proteins. The luminal domain of OST48 interacts with 

25 those of RI and RII and the cytoplasmic domain of OST48 has affinity for the cytoplasmically exposed N- 
terminal tail of DAD1 [Kelleher, D. et al. (1997) Proc Natl Acad. ScL USA 94: 4994-4999; Fu, J. et al. 

(1997) J. Biol Chem 272: 29687-29692]. 

Dadl is a small hydrophobic protein, thought to be an integral membrane protein, with a 
cytoplasmically located N terminus and up to three transmembrane domains. As is true for the other 
30 subunits of OST, the precise role of Dadl in N-glycosylation is not known. However, it has been shown 
that Dadl is critical for the function and the structural integrity of the OST complex [Sanjay, A. et al 

(1998) J. Biol Chem 273: 26094-26099]. Also, it is worth noting that the Dadl protein was first identified 
in 1993 as a mammalian cell death suppressor since loss of its function induces apoptosis in hamster 
BHK21 cells [Nakashima, T. et al (1993) Molecular and Cellular Biology 13:6367-6374]. Since then, 
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several reports have confirmed the anti-apoptotic role for Dadl [Hong, NA. et al, (2000) Dev Biol 
220:76-84; Brewster, JL. et al, (2000) Genesis 26: 271-8); Yoshimi, M. et al, (2000) Biochem Biophys 
Res Commun 276: 965-9]. 

Dadl is a highly conserved protein whose sequence has been determined for diverse organisms 
5 including several vertebrates, a nematode, and several plants. A comparaison of these sequences reveals 
that the amino-terminal region preceding the first membrane-spanning segment is the least conserved 
region of the protein both with repect to lengh and amino acid sequence identity. The most highly 
conserved sequences of Dadl include the second and third membrane spanning segments, making them 
probably the most crucial regions for Dadl function [Kelleher, D. et al (1997) Proc Natl. Acad. ScL USA 

10 94: 4994-4999]. The importance of the C-terminus region for mediating Dadl functions has been recently 
confirmed [Makishima, T. etal, (2000) J. Biochem (Tockyo) 128:399-405] 

Therefore, Dadl is thought to act as a positive regulator of the oligosaccharyltransferase complex, 
and as a negative regulator of apoptosis. In addition, the C-terminus of the Dadl protein seems to be 
important for mediating these functions. As mentionned above, the protein of the invention is a new 

15 isoform of the Dadl protein resulting from an alternative splicing event. As a result of this alternative 
splicing event, the C-terminus of the protein of the invention is shortened and does not display the third 
transmembrane domain of Dadl. Since the C-terminus of Dadl has been shown to be important for 
mediating the protein function, it is believed that the protein of the invention has rather an antagonistic 
action to the one of Dadl. It is worth noting that this type of situation in which the same gene give rise by 

20 alternative splicing to different protein products with opposing functions is a commun theme among 
apoptosis genes [For a review, see Reed, JC. (1999) Nat. Biotechnol 17: 1064-65 ]. 

Thus, it is believed that the protein of the invention of SEQ ID NO: 368 plays a role in the control 
of N-glycosylation of celullar proteins. Preferably, the protein of the invention is thought to act as a 
positive regulator of apoptosis and a negative regulator of the OST complex. Preferred polypeptides of 

25 the invention are polypeptides comprising the amino acids of SEQ ID NO: 368 from positions 1 to 78, 
and 71 to 78. Other preferred polypeptides of the invention are fragments of SEQ ID NO: 368 having any 
of the biological activity described herein. The activity of the protein of the invention or part thereof on 
protein N-glycosylation may be assayed using any of the assays known to those skilled in the art. For 
example, one could use DNA-mediated gene transfer techniques in order to introduce the cDNA sequence 

30 of SEQ ED NO: 127 or part thereof into cell lines so that the protein of SEQ ID NO: 368 or part thereof is 
over expressed in these cell lines. The resulting effect of this over expression on the N-glycosylation of 
proteins can then be studied using immunoblotting or Western blotting of glycoproteins [Makishima et al. 
(1997) Genes Cells 2: 129-141; Silberstein et al. (1995) J. Cell. Biol: 131: 371-383; Hong et al (2000) 
Developmental Biology 220: . The activity of the protein of the invention or part thereof on cellular 
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apoptosis may be assayed using any of the assays known to those skilled in the art including those 
described by Nakashima et al (1993) supra. 

One object of the present invention are compositions and methods of targeting heterologous 
polypeptides to the endoplasmic reticulum by recombinantly or chemically fusing a fragment of the proteins 
5 of the invention to an heterologous polypeptide. Preferred fragments are any fragments of the proteins of the 
invention, or part thereof, that may contain targeting signals for the endoplasmic reticulum such as those 
described in Pidoux AL, Armstrong EMBO J 1992 Apr;l 1(4): 1583-91; Munro S, Pelham HR Cell 1987 Mar 
13;48(5):899-907; Pelham HR Trends Biochem Sci 1990 Dec;15(12):483-6. 

In another embodiment, the invention relates to compositions and methods using the protein of 

10 the invention or part thereof to stimulate cells'entry into apoptosis. In a preferred embodiment, the pro- 
apoptosis protein of the invention or part thereof is added to an in vitro culture of mammalian or plant 
cells in an amount effective to stimulate apoptosis. In another preferred embodiment, the cDNA sequence 
of SEQ ID NO: 127 or part thereof may be used to create transgenic animals or plant cells in which the 
disclosed protein of the invention or part thereof can be expressed at higher levels than normal whenever 

15 and wherever it is desired. Ways to create transgenic cells in which the expression of the transgene can 
be turn on or off whenever it is desired are well known in the art. Increasing the expression level of the 
protein of the invention in cells to stimulate programmed cell death may be useful for applications in 
which a given species of cells become undesirable upon a given event, i.e., infection, transformation, end 
of a production process, etc. . . 

20 Furthermore, the invention relates to methods and compositions using the protein of the invention 

or part thereof to diagnose, prevent and/or treat disorders characterized by abnormal cell proliferation 
and/or programmed cell death, including but not limited to cancer, immune deficiency syndromes 
(including AIDS), type I diabetes, pathogenic infections, cardiovascular and neurological injury, alopecia, 
aging, degenerative diseases such as Alzheimer's Disease, Parkinson's Disease, Huntington's disease, 

25 dystonia, Leber's hereditary optic neuropathy, schizophrenia, and myodegenerative disorders such as 
"mitochondrial encephalopathy, lactic acidosis, and stroke" (MELAS), and "myoclonic epilepsy ragged 
red fiber syndrome" (MERRF). For diagnostic purposes, the expression of the protein of the invention 
could be investigated using any of the Northern blotting, RT-PCR or immunoblotting methods described 
herein and compared to the expression in control individuals. For prevention and/or treatment purposes 

30 of disorders in which cell proliferation needs to be reduced and/or apoptosis increased, the expression of 
protein of the invention may be enhanced using any of the gene therapy methods described herein or 
known to those skilled in the art. For prevention and/or treatment purposes of disorders in which cell 
proliferation needs to be enhanced and/or apoptosis reduced, inhibition of endogenous expression of the 
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protein of the invention may be achieved using any methods or known to those skilled in the art including 
the triple helix and antisense strategies described herein. 

Moreover, antibodies to the protein of the invention or part thereof may be used for detection of 
the endoplasmic reticulum for histological purposes using any techniques known to those skilled in the 
5 art. 

Protein of SEP ID No: 284 (internal designation 174-3 8-3 -0-C9-CS) 

The protein of SEQ ID No: 284 encoded by the cDNA of SEQ ID No: 43 is overexpressed in 
salivary gland. The 406-amino-acid-long protein of invention, which is similar in size to 
fucosyltransferases, displays a Pfam motif of the fusosyltransferase family from residues 70 to 406. 

10 Furthermore, the present protein of invention is homologous to a putative fucosyltransferase of 
Drosophila melanogaster (STR accession number: Q9VLC1 and Q9VLC1). The protein of SEQ ID 284 
also shares homology with the alphal,3 fucosyltransferase (E.C. 2.4.1.152), found in Brachydanio renio 
(EMBL accession number : AB023627), Schistosoma mansoni (GENPEPT accession number : 
AF183577-1), cattle (SPTREMBL accession number Q9TQQ3), and human species (GENPEPT 

15 accession number : AJ132772_2). Like fucosyltransferases, the protein of the invention displays the 
features of type II transmembrane proteins with a short N -terminal cytoplasmic tail, a 9-29 amino acid 
signal-anchor transmembrane domain, and a large C-terminal domain. Furthermore, the present protein of 
invention displays an almost perfect consensus motif of the alpha- 1,3 fucosyltransferases from residues 
315 to 345 (Breton et al. Glycobiology 1998; 1: 87-94). 

20 Fucosyltransferases are a family of enzymes that catalyze the transfer of fucose from GDP- 

fucose, to galactose in an alphal,2 linkage, and to N-acetylglucosamine in alphal,3-, alphal,4- and 
alphal,6- linkages. Since all fucosyltransferases use the same nucleotide sugar, their specificity will 
probably reside in the recognition of the acceptor and in the type of linkage formed. In human species, 
fucosyltransferases, which are type II membrane proteins found in Golgi, can be split into three distinct 

25 families (Breton et al. Glycobiology 1998; 1: 87-94): (1) the alpha- 1,2-fucosyltransferases, hFUTl and 
hFUT2, which yield nearly identical products as only single carbohydrate linkage differentiates type I 
from type II glycans. hFUTl determines the expression of O-type antigen (H antigen) of the ABO blood 
group system on erythrocytes, whereas hFUT2 (Se) determines it in saliva, i.e. secretor status; (2) The 
alpha- 1,3-fucosyltransferases that constitute a distinct homogenous family of proteins, although some 

30 regions display similarities with the alpha-1,2 and alpha- 1,6-fucosyltranferases (Breton et al. 
Glycobiology 1998;1:87-94). Five alpha -1,3-fucosyltransferases have been characterized to date in the 
human species, i.e. hFUT3 (Lewis enzyme), hFUT4 (myeloid-type), hFUT5, hFUT6 (plasma -type), and 
hFUT7. These are involved in the lasts steps of the biosynthesis the carbohydrate antigen sialyl Lewis of 
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ABH (de Vries et al. J Biol Chem 1995;270:8712-22 ; Kimura et al. Biochem Biophys Res Commun 
1997 8;237:131-7) ; (3) The alpha- 1,6-fucosyl transferase, hFUT8, which is implicated in the synthesis of 
N-glycans (Miyoshi et al. Biochim Biophys Acta 1999;1473:9-20). 

The fucosylated cell surface glycoconjugates play important roles in physiological and 
5 pathological processes, such as fertilization, embryogenesis, lymphocyte trafficking, immune response, 
and cancer metastasis (Staudacher et al. Trends Glycosci Glycotechnology 1996; 8:391-408). More 
specifically, the fucosylated cell surface glycoconjugates, which are present on the apical surface of 
various epithelium, contribute to resistance of various microorganisms agents including bacteria as 
Helicobacter pylori (Umesaki et al. Science 1997;276:964-5), and E. coli (Vogeli et al. Schweiz Arch 

10 Tierheilkd 1997;139:479-84), and virus such as HIV (Ali et al. Infect Dis 2000;181:737-9). On the other 
hand, abnormal upregulation of fucosyltransferases is a common finding in various types of tumors, 
which cause an increased production of fucosylated glycoconjugates. Such fucosylated glycoconjugates 
can also serve as tumor markers and include (1) the Cal9-9 cancer antigen, which circulating sialyl-Lewis 
a structure produced by hFUT3 and used for diagnosis of pancreatic and gastric cancer (Koprowski et al. 

15 Somatic Cell Genet 1979;5:957-71), and (2) alpha-foetoprotein whose alpha 1,6-fucosylatation is reduced 
in hepatoma (Miyoshi et al. Biochim Biophys Acta. 1999; 1473 :9-20).On the other hand, aberrant 
production of fucosylated glycoconjugates can provide selective growth advantage by facilitating the 
extravasation of tumor cells, since they participate to endothelial adhesion through interaction with E- and 
P- selectins of endothelial cells (Butcher and Picker, Science 1996;272:60-6). Consequently, modulation 

20 of fucosyltransferase activity can modify tumorogenicity in various model of tumors including hepatoma 
(Miyoshi et al. Biochim Biophys Acta. 1999;1473:9-20), and colorectal adenocarcinoma (Weston et al. 
Cancer Res 1999;59:2127-35). 

Thus, it is believed that the protein of the invention of SEQ ID NO: 284 is a glycosyltransferase, 
preferably an hexosyltransferase, more preferably a fusosyltransferase, even more preferably an alpha- 

25 1,3-fucosyltransferase, and as such plays a role in fertilization, embryogenesis, lymphocyte trafficking, 
immune response, cancer metastasis and resistance to various microorganisms. Preferred polypeptides of 
the invention are polypeptides comprising the amino acids of SEQ ID NO: 284 from positions 70 to 406, 
and 315 to 345. Other preferred polypeptides of the invention are fragments of SEQ ID NO: 284 having 
any of the biological activity described herein. The glycosyltransferase activity of the protein of the 

30 invention or part thereof may be assayed using any of the assays known to those skilled in the art 
including those described in (Palcic et al. Carbohydr Res 1990;196:133-40). 

Fucosylated compounds have considerable potential both as therapeutics and as reagents for 
clinical assays. However, synthesis of glycosylated compounds of potential commercial and/or 
therapeutic interest is difficult because of the very nature of the saccharide subunits. A multitude of 
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positional isomers in which different substituent groups on the sugars become involved in bond 
formation, along with the potential formation of different anomeric forms, are possible. As a result of 
these problems, large scale chemical synthesis of most carbohydrates is not possible due to economic 
considerations arising from the poor yields of desired products. Enzymatic synthesis using glycosyl 
5 transferases such as fucosyltransferase provides an alternative to chemical synthesis of carbohydrates. 
Enzymatic synthesis using glycosidases, glycosyl transferases, or combinations thereof, have been 
considered as a possible approach to the synthesis of carbohydrates. As a matter of fact, enzyme-mediated 
catalytic synthesis would offer dramatic advantages over the classical synthetic organic pathways, 
producing very high yields of carbohydrates economically, under mild conditions in aqueous solutions, 

10 and without generating notable amounts of undesired side products. To date, such enzymes are however 
difficult to isolate, especially from eukaryotic, e.g., mammalian sources, because these proteins are only 
found in low concentrations, and tend to be membrane-bound. In addition to being difficult to isolate, the 
acceptor (peptide) specificity of glycosyl transferases is poorly understood. Thus, there is a need for 
obtaining recombinant glycosyl transferase, including fucosyltransferases, that could be produced in very 

15 large amounts. 

Thus, the invention related to methods and compositions using the protein of the invention or part 
thereof to synthesize glycosylated compounds, either glycoproteins, glycolipids, or oligosaccharides, 
more particularly fucosylated compounds. If necessary, the protein of the invention or part thereof may be 
produced in a soluble form by removing its transmembrane domains and/or its Golgi retention signal 

20 using any of the methods skilled in the art including those described in US patent 5,776,772. For example, 
the protein of the invention or part thereof is added to a sample containing GDP-fucose and a substrate 
compound in conditions allowing glycosylation, more particularly fucosylation and allowed to catalyze 
the glycosylation of this compound. In a preferred embodiment, the enzymatic reaction carried out by the 
protein of the invention is part of a series of other chemical and/or enzymatic reactions aiming at the 

25 synthesis of complex glycosylated compounds, such as the ones described in US patents 5,409,817 and 
5,374,541. In another preferred embodiment where the method is to be practiced on a commercial scale, it 
may be advantageous to immobilize the glycosyltransferase on a support. This immobilization facilitates 
the removal of the enzyme from the batch of product and subsequent reuse of the enzyme. Immobilization 
of glycosyltransferases can be accomplished, for example, by removing from the transferase its 

30 membrane-binding domain, and attaching in its place a cellulose-binding domain. One of skill in the art 
will understand that other methods of immobilization could also be used and are described in the available 
literature. 

In a preferred embodiment embodiment, the present invention relates to processes and 
compositions for producing glycosylated compounds, preferably fucosylated compounds, wherein a cell 



189 



G-078US05DIV 



is genetically engineered to produce the protein of the invention or part thereof and used in combination 
with one or several other cells able to produce the donor substrate for the protein of the invention. 

In another preferred embodiment, the present invention relates to a process and compositions for 
controlling the glycosylation of proteins in a cell wherein an insect, plant, or animal cell is genetically 
5 engineered to produce one or more enzymes that provide internal control of the cell's glycosylation 
mechanism. Preferably, the invention relates to a Chinese hamster ovary (CHO) cell line that is 
genetically engineered to produce a fucosyltransferase of the present invention either alone or in 
combination with other glycosyl transferases. This supplemental fucosyltransferase modifies the 
glycosylation machinery to produce glycoproteins having carbohydrate structures that more closely 
10 resemble naturally occurring human glycoproteins. The methods for performing the above process and 
making the above compositions are carried out using the methods known in the art and described in U.S. 
Patent No. 5,047,335. 

Another embodiment of the present invention relates to compositions and methods using the 
protein or part thereof to detect fucosylated conjugates. In a preferred embodiment, the protein of SEQ ID 

15 No: 284 or part of thereof is used to obtain reagents, such as antibodies. These reagents could be used in 
radioimmunoassays, competitive binding assays, Western Blot analysis, enzyme-linked immunosorbant 
assay (ELISA), immunohistochemisty, or any other technique known to those skilled in the art (Palcic et 
al. Carbohydr Res 1990;196:133-40). In a preferred embodiment, antibodies raised against the present 
protein of invention provides tools to specifically visualize salivary or digestive tract tissues (and cells 

20 derived from the tissues). This can be useful for various applications, including the determination of the 
origin or identity of cells, e.g. cancerous cells, as well as to facilitate the identification of particular cells 
and tissues for, e.g. the evaluation of histological slides. Such assays may also be used for diagnosis in 
various disorders including, but are not limited to, neoplastic tumors such as salivary, prostate, liver, 
digestive disease tract and pancreas cancers. Various types of samples can be assayed, including tumor 

25 tissues, or other biological samples such as serum or plasma. 

The invention further relates to glycosylated compounds, preferably fucosylated compounds, 
obtained using any of the processes described herein using the protein of the invention or part thereof. 
Such compounds may be used in the diagnosing, prevention and/or treating of disorders including, but are 
not limited to, cancer, cystic fibrosis, ulcer, inflammation and immune based disorders, including 

30 autoimmune disorders such as arthritis, fertility disorders, and hypothyroidism. These conditions include 
, infectious diseases where active infection exists at any body site, such as meningitis and salpingitis; 
complications of infections including septic shock, disseminated intravascular coagulation, and/or adult 
respiratory distress syndrome; acute or chronic inflammation due to antigen, antibody and/or complement 
deposition; inflammatory conditions including arthritis, cholangitis, colitis, encephalitis, endocarditis, 
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glomerulonephritis, hepatitis, myocarditis, pancreatitis, pericarditis, reperfusion injury and vasculitis. 
Immune-based diseases include but are not limited to conditions involving T-cells and/or macrophages 
such as acute and delayed hypersensitivity, graft rejection, and graft-versus-host disease; auto-immune 
diseases including type I diabetes mellitus and multiple sclerosis. In a preferred embodiment, these 
5 glycosylated compounds or derivatives thereof may be used as pharmacological agents to trap pathogens 
or endogenous ligands thus reducing the binding of pathogens or endogenous ligands to the endogenous 
glycosylated compounds. For example, such compounds may be used to prevent and/or inhibit the 
adhesion of cancer cells to inner wall of blood vessel or aggregation between cancer cells and platelets, 
thus reducing cancer metastasis, to prevent and/or inhibit the adhesion of neutrophils to blood vessels 
10 endothelial cells. Other disorders include infections in which recognition of a glycosylated product is 
essential to the development of the infection. Such infections include, but are not limited to those caused 
by Helicobacter pylori, E. coli and viruses such as HIV. In a preferred embodiment, such compounds, 
preferably oligosaccharides, are used as gram positive antibiotics and disinfectants (U.S. Pat. Nos. 
4,851,338 and 4,665,060). 

15 The invention further relates to methods and compositions using the protein of the invention or 

part thereof for diagnosis, prevention and/or treatment of several disorders in which recognition of 
glycosylated compounds, preferably of fucosylated compounds, is impaired or needs to be impaired. For 
diagnostic purposes, the expression of the protein of the invention could be investigated using any of the 
Northern blotting, RT-PCR or immunoblotting methods described herein and compared to the expression 

20 in control individuals. For prevention and/or treatment purposes, inhibiting the endogenous expression of 
the protein of the invention may be used to reduce the production of glycosylated compounds detrimental 
to the organism using any of the antisense or triple helix methods described herein as well as antagonists 
of the protein's activity. 

In another embodiment, various substances can be used for treatment, attenuation and/or 
25 prevention for treatment of abnormal conditions associated to unbalanced amounts and/or activity of the 
protein of SEQ ID No. 284. Such substances include, but are not limited to, chemical compounds such as 
agonists and antagonists, nucleic acids, and antibodies. In particular, the protein of the invention or part 
thereof may be used in the development of inhibitors of glycosyl transferase, more particularly inhibitors 
of fucosyltransferases, for mechanistic and clinical applications (Taylor, Curr Opin Struc Biol 
30 1996;6:830-7 ; Colman, Pure Appl Chem 1995;67:1683-8; Bamford, Enz Inhib 1995;10:1-16 ; Khan & 
Matta, In Glycoconjugates, Composition, Structure, and Function. pp361-378. eds., Allen, H. J. & 
Kisailus, E. C. Marcel Dekker, Inc. New York, 1992 ; Thorne-Tjomsland et al., Transplantation 
2000;69:806-8 ; Basset et al., Scand J Immunol 2000;51:307-11). Such substances may be employed for 
treatment of a variety of therapeutic and prophylactic purposes including certain types of neoplastic 
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disorders. For instance, substances targeted against the protein of SEQ ID No. 284 can be administered to 
treat patients affected with, but not limited to, salivary, prostate, liver, digestive disease tract and pancreas 
cancers. Alternatively, such substances can be used for treatment, attenuation and/or prevention of 
infectious disease in order to induce resistance of various microorganisms agents. 

5 Protein of SEP ID NO:292 (internal designation 181-10-1-0-DlO-CS) 

The protein of SEQ ED NO:292 is encoded by the cDNA of SEQ ID NO:5 1 . Accordingly, it will 
be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:292 described 
throughout the present application also pertain to the polypeptide encoded by a nucleic acid included in 
clone 181-10-1 -0-D10-CS. In addition, it will be appreciated that all characteristics and uses of the 

10 nucleic acid of SEQ ID NO:51 described throughout the present application also pertain to the nucleic 
acid included in clone 181-10-1-0-DlO-CS. 

The protein of SEQ ID NO:292 was identified among the cDNAs from a library constructed from 
fetal liver. Tissue distribution analysis using databases indicated that mRNA encoding this protein was 
found primarily in fetal kidney and fetal liver. 

15 The protein of SEQ ID NO:292 is most likely a polymorphic variant (92% identity) of human 

secreted protein SEQ ID NO: 197 from the protein described in PCT publication WO 9906553-A2, the 
disclosure of which is incorporated herein by reference in its entirety. Further, the protein of SEQ ID 
NO:292 is homologous to the C-type lectin domain of mouse macrophage asialoglycoprotein-binding 
protein (M-ASGP-BR 36% identity), mouse natural killer (NK) cell surface protein PI 40 (NKR-, PI. 9, 

20 34%) and human asialoglycoprotein receptor L-H2 from EP 773289-A2 (27%). Thus, the present 
invention relates to nucleic acid and amino acid sequences of a lectin-like protein and to the use of these 
sequences in the diagnosis, study, prevention and treatment of disease. 

The protein of SEQ ID NO:292 consists of 1 1 1 amino acids. From the amino acid alignments 
and the hydrophobicity plots, it has a predicted signal peptide sequence spanning residues 12-24 and one 

25 predicted transmembrane domain spanning residues 5-25. Accordingly, one embodiment of the present 
invention is a polypeptide comprising the signal peptide or the transmembrane domain. 

A number of different protein families share a conserved domain which was first characterized in 
some animal lectins and which seems to function as a calcium-dependent carbohydrate-recognition 
domain (Drickamer K., J. Biol. Chem., 263:9557-9560,1988, the disclosure of which is incorporated 

30 herein by reference in its entirety). This domain, which is known as the C-type lectin domain (CTL) or as 
the carbohydrate-recognition domain (CRD), consists of about 110-130 residues. There are four cysteines 
that are perfectly conserved and involved in two disulfide bonds. Several categories of proteins can de 
found in which the CTL domain has been described/Both M-ASGP-BP and NKR-P1 are type II 
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membrane proteins. Type II membrane proteins in which the CTL domain has been located at the C- 
terminal extremity include: 1) Asialoglycoprotein receptors (ASGPR), also known as hepatic lectins. The 
ASGPR' s mediate the endocytosis of plasma glycoproteins to which the C-terminal sialic acid residue in 
their carbohydrate moieties has been removed. 2) A number of proteins expressed on the surface of NK 
5 cells, and some subsets of T cells: NKG2, NKR-P1, Ly-49, CD69, and on B cells: CD72, LyB-2. The 
CTL- domain in these proteins is distantly related to other CTL-domains, and it is unclear whether they 
all bind carbohydrates. 

M-ASGP-BP is a lectin-like molecule expressed on the surface of activated macrophages and 
specific for terminal D-galactose and N-acetyl-D-galactosamine units (Oda S et al., J. Biochem., 104:600- 
10 605,1988, the disclosure of which is incorporated herein by reference in its entirety). Experimental 
results suggest that M-ASGP-BP participates in the interaction between tumoricidal macrophages and 
tumor cells. 

ASGPR is a membrane protein expressed specifically by hepatocytes. Its function is to uptake 
asialoglycoproteins in the serum for degradation in the liver. Partially deglycosylated plasma 

15 glycoproteins are efficiently and specifically removed from the circulation by a receptor-mediated 
process. In mammals, the ASGPR specific for desialylated (galactosyl-terminal) glycoproteins, is 
expressed exclusively in hepatic parenchymal cells. Following binding of the ligand to this cell surface 
receptor, the receptor-ligand complex is internalized and transported by a series of membrane vesicles and 
tubules to an acidic-sorting organelle where receptor and ligand dissociate (Spiess M et al., J. Biol. 

20 Chem., 260:1979-1982, 1985, the disclosure of which is incorporated herein by reference in its entirety). 
Reduction in expression of AGPR has been reported in response to such liver conditions as hepatic 
cirrhosis, liver cancer and regenerated liver (Stadalnik et al., J. Nucl. Med., 26:1233-1242, 1985, the 
disclosure of which is incorporated herein by reference in its entirety). It has also been reported that 
ASGPR itself is present in serum (Katsugi et al., Alcohol Metabolism and Liver, 12:65-68, 1992, the 

25 disclosure of which is incorporated herein by reference in its entirety)* which resulted in significant 
research being pursued toward the measurement of serum ASGPR. Furthermore, published results 
indicate that labeling compounds binding to ASGPR can be used as good indicators of liver function 
(Kudo, et al., Japan Assoc. of Gastrointest. Pathology., 89:1349-1359, 1992, the disclosure of which is 
incorporated herein by reference in its entirety). 

30 NK cells constitute the third major population of lymphocytes. They possess the inherent capacity 

to kill various tumors and virally infected cells and mediate the rejection of allografts. These properties 
allow NK cells to have a major role in the regulation of innate immune responses in particular, and 
immunological functions in general. Members of the NKR-P1 family are type-II transmembrane C-type 
lectin receptors found on the surface of NK cells and a subset of T lymphocytes (NK T cells). Further, a 
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subset of NKR-P1 molecules has been identified at the surface of peripheral blood monocytes and 
dendritic cells (Poggi A et al., Eur. J. Immunol., 27: 2965-2970, 1997, the disclosure of which is 
incorporated herein by reference in its entirety). Deficiencies in NKR-P1 + T cells, which preferentially 
accumulate in the liver and bone marrow, have been implicated in the susceptibility to many diseases 
5 including insulin-dependent diabetes mellitus (IDDM, Tori M et al. Transplantation, 70:32-38, 2000, the 
disclosure of which is incorporated herein by reference in its entirety) and multiple sclerosis (Poggi A et 
al., J. Immunol., 162: 4349-4354, 1999, the disclosure of which is incorporated herein by reference in its 
entirety). NKR-P1 receptors have been shown to activate NK cell cytotoxicity coupled with release of 
interferon-y (IFN-y, Brown MG., Immunol. Rev., 155: 55-75, 1997, the disclosure of which is 

10 incorporated herein by reference in its entirety). However, unlike the well-characterized MHC class I 
ligands that regulate the specificity of the Ly-49 family of molecules, which are structurally related to the 
NKR-P1 receptors, cognate ligands for the NKR-P1 molecules have yet to be identified. Interestingly, it 
has been reported that a subset of the NKR-P1 molecules- NKR-P1B -inhibits NK cell activation (Carlyle 
et al., J. Immunol., 162:5917-5923, 1999, the disclosure of which is incorporated herein by reference in 

15 its entirety). 

Based on the structural and chemical homologies the protein of ' SEQ ID NO:292 was 
characterized as a C-type lectin-like, type II membrane protein, whose ligand binding may be calcium 
dependent. The protein of SEQ ID NO:292 or fragments thereof may provide the basis for clinical 
diagnosis of diseases associated with its induction and/or repression. This protein, framents thereof or 

20 antagonists/inhibitors thereof may be useful in the diagnosis and treatment of tumors, viral infections, 
inflammation, or conditions associated with impaired immunity, organ transplantation, bacterial 
infections, autoimmunity, hepatic dysfunction and liver regeneration. Furthermore, the protein SEQ ED 
NO: 292 or fragments thereof may be used as a reagent for analyzing the control of gene expression by 
IFNs and other cytokines such as IL-12 and IL-4, as well as growth and transcription factors, in normal 

25 and diseased cells. 

The protein of SEQ ID NO:292 has homology to the CTL domains of the ASPRG, M-ASGP-BP 
and NKR-P1 molecules. The protein of SEQ ID NO:292, in membrane-bound or soluble forms may have 
cytokine receptor activity, cell proliferation/differentiation activity, T cell activation activity, tissue 
growth regulating activity, receptor/ligand activity, signal transduction activity, to promote 
30 transendothelial migration, anti-inflammatory activity, tumor inhibition activity, among others. 
Accordingly, the protein SEQ ID NO:292 or fragments thereof may be used in diagnosis and treatment of 
diseases such as, but not limited to, autoimmune disorders such as autoimmune hepatitis, rheumatoid 
arthritis, Graves disease, systemic lupus erythematosus, Wegener's granulomatosis, sarcoidosis, 
polyarthritis, pemphigus, pemphigoid, erythema multiform, Sjogren's syndrome, inflammatory bowel 
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disease, autoimmune encephalitis, myasthenia gravis keratitis, scleritis, Lupus Nephritis, and allergic 
encephalomyelitis; proliferative disorders including various forms of cancer such as leukemias, 
lymphomas (Hodgkins and non-Hodgkins), sarcomas, melanomas, adenomas, carcinomas of solid tissue, 
hypoxic tumors, squamous cell carcinomas of the mouth, throat, larynx, and lung, genitourinary cancers 
5 such as cervical and bladder cancer, hematopoietic cancers, head and neck cancers, and nervous system 
cancers, benign lesions such as papillomas, atherosclerosis, angiogenesis; viral infections, in particular 
HBV, HCV and HIV infections, as well as other viral- and pathogen-induced infections. The protein of 
SEQ ID NO:292 or fragments thereof may also be used to treat conditions associated with inflammation 
or immune impairment (e. g. reumathoid and osteo arthritis and AIDS), allergy, hepatic cirrhosis and liver 

10 toxicity; as well as genetic disorders, chronic illnesses and infections associated with decrease in NK, NK 
T, moacrophage, monocyte and dendritic cell functions. In another embodiment of the invention, 
inhibitors of the protein of SEQ ID NO:292 may be used to treat conditions such as multiple sclerosis, 
IDMM, graft versus host disease (GVH) and transplanted organ rejection. 

Another embodiment relates to methods to treat and/or prevent the bacterial infections that arise 

15 in liver due to bacterial antigens brought from the intestine from the portal vein. In this embodiment, the 
protein of SEQ ID NO:292 may be used to counteract the effects of the bacterial endotoxin 
lipopolysaccharide (LPS). Another embodiment of the invention is the use of the protein of SEQ ID 
NO:292 or fragments thereof to inhibit of NK cells activated by bacterial superantigens or LPS, which 
would help treat vascular endothelial injury in conditions such as Kwasaki disease. 

20 The appearance of autoantibodies against the protein of SEQ ID NO:292 can be used as an 

indicator for autoimmune hepatitis (AIH), a disease that can lead to cirrhosis and fatal intractable 
hepatitis, as well as primary biliary cirrosis. The nucleic acid sequences encoding the protein of SEQ ID 
NO:292 or fragments thereof can be used for producing secreted forms of the protein. They can also be 
used to develop products for diagnosis and therapy. Accordingly, recombinant soluble derivatives can be 

25 used for detecting and measuring antibodies specific for the protein of the invention, e.g. by ELISA, 
Western blotting, etc. This allows AIH to be diagnosed and distinguished from other diseases. 

In another embodiment of the invention, the protein of SEQ ID NO: 292 or fragments or 
derivatives thereof can also be used for the analysis and purification of asialoglycoproteins and to develop 
inhibiting agents against asialoglycoprotein incorporation, or viral and other protein invasion, into liver 

30 cells. 

Another embodiment of the present invention relates to polypeptides comprising the protein of 
SEQ ID NO:292 or fragments thereof and polynucleotides encoding the protein of SEQ ID NO:292 or 
fragments thereof. In another aspect the protein of SEQ ID NO:292 or fragments thereof may be used to 
identify specific molecules with which it binds such as agonists, antagonists or inhibitors. In a further 
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aspect, the invention relates to methods for identifying agonists and antagonists/inhibitors of the protein 
of SEQ ID NO:292, and treating conditions associated with the protein of the invention or imbalance 
with the identified compounds. In a still further aspect, the invention relates to diagnostic assays for 
detecting diseases associated with inappropriate levels or activity of the protein of SEQ ID NO:292. 
5 Another embodiment of the invention relates to methods of measuring the amount of the protein of SEQ 
ID NO:292 in serum. Another embodiment relates to the use of labeling compounds that bind to the 
protein of SEQ ID NO: 292 and can be used as good indicators of liver function or NK cell activity, 
among others. 

An embodiment of the present invention relates to methods of using the protein of the invention 

10 or part thereof to identify and/or quantify or other ligands, which may interact with the protein of SEQ ID 
NO:292. The protein of SEQ ID NO:292 or fragments thereof may be include in pharmaceutical 
preparations for treating cancer or prevention/treatment of other diseases associated with changes in 
expression of the protein of the invention (see above). In a preferred embodiment of the invention the 
protein of the invention or part thereof is used to modulate the effect of cytokines and related molecule 

15 such as IL-1, IL-2, IL-12, IFN- y. The protein of SEQ ID NO:292 may also be used to correct defects in 
vivo models of disease such as autoimmune, inflammation, pathogen-mediated infection, liver toxicity, 
allograft rejection, GVH, as well as tumor models, by injecting the protein either intraperitoneally, 
intravenously, subcutaneously or directly in the diseased tissue. 

The DNA encoding the protein of SEQ ID NO: 292 or fragments thereof may be used in 

20 diagnostic assays for conditions/diseases associated with up-regulation or down-regulation of the 
expression of the protein of the invention (see above). The diagnostic assay is useful to distinguish 
between absence, presence, and excess expression of the protein and to monitor regulation of levels of the 
protein of during therapeutic intervention. The DNA may also be incorporated into effective eukaryotic 
expression vectors and directly targeted to a specific tissue, organ, or cell population for use in gene 

25 therapy to treat the above mentioned conditions, including tumors and/or to correct disease- or genetic- 
induced defects in any of the above mentioned proteins including the protein of the invention. The DNA 
may also be used to design antisense sequences and ribozymes, which can be administered to modify 
gene expression in NK, NK T, macrophages, monocytes and dendritic cells and to influence expression of 
cytokines such as IL-1, IL-2, IL-4, IL-12, and IFN-y. In vivo delivery of genetic constructs into subjects 

30 can be developed to the point of targeting specific cell types, such as tumor where expression of the 
protein of SEQ ID NO:292 may be affected or is modulating the expression and/or activity of other 
proteins such as cytokines, growth factors, their receptors and/or tumor antigens. The DNA may also be 
used to identify unknown upstream sequences (e. g. promoters and regulatory elements) by standard 
techniques and for research into the control of gene expression by IFNs and other cytokines, as well as 
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growth and transcription factors in normal and diseased cells. Hybridization probes are useful to detect 
DNA encoding the protein of SEQ ID NO: 292 (or closely related molecules) in biological samples, and 
for mapping the naturally occurring genomic sequence to a particular chromosome/chromosome region. 
The DNA may be used to generate and/or treat in vivo animal models of disease, including susceptibility 
5 or resistance to infection, tumors, autoimmune conditions, GVH, allograft rejection and liver toxicity, 
based on vaccine, knock-out and transgene technologies. 

Antibodies against the protein of SEQ ID NO:292 are useful for the diagnosis of conditions and 
disease associated with its expression and to quantify the protein of the invention (e. g. in assays to 
monitor patients during therapeutic intervention). Antibodies specific for the protein may include, but are 

10 not limited to, polyclonal, monoclonal, chimeric, single chain, Fab fragments produced by a Fab 
expression library. Neutralizing antibodies are especially preferred for diagnostics and therapeutics. 
Diagnostic assays for the protein of the invention include methods utilizing the antibody and a label to 
detect the protein of SEQ ID NO:292 in human body fluids or extracts of cells or tissues as well as 
methods for detecting or measuring antibodies against the protein of SEQ ID NO:292. 

15 The protein of SEQ ID NO:292 and its catalytic or immunogenic fragments or oligopeptides 

thereof, can be used for screening therapeutic compounds in any variety of drug screening techniques 
including high throughput. Methods which may be used to quantitate the expression of the nucleotide or 
protein of the invention include, but are not limited to, polymerase chain reaction (PCR), RT-PCR, 
RNAse protection, Northern blotting, enzyme-linked immunosorbent asay (ELISA), radioimmunoassay 

20 (RIA), fluorescent activated cell sorting (FACS), immunoprecipitation , and chromatography. 

Accordingly, the protein of SEQ ID NO:292 or fragments thereof may be used to purify or enrich 
proteins containing carbohydrates. In such embodiments, the lectin of the present invention is placed in 
contact with carbohydrate-containing proteins under conditions which facilitate specific binding. The 
lectin of the present invention may be fixed to a solid support. After binding, specifically bound proteins 

25 are dissociated using appropriate salt or other conditions. 

The protein of SEQ ID NO:292 or fragments thereof may also be used to regulate any of the 
activities described above, including the interaction between tumoricidal macrophages and tumor cells, 
the activity of NK cells, the treatment of bacterial infections resulting from bacterial antigens brought 
from the intestine, or to counteract the effects of bacterial LPS. 

30 Accordingly, the present invention includes the use of the protein of SEQ ID NO:292 , fragments 

comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 
acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition in an 
individual. For example, the condition may be any of those described above or an abnormality in any of 
the functions listed above. In such embodiments, the protein of SEQ ID NO:292, or a fragment thereof, is 
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administered to an individual in whom it is desired to increase or decrease any of the activities of the 
protein of SEQ ID NO:292. The protein of SEQ ID NO:292 or fragment thereof may be administered 
directly to the individual or, alternatively, a nucleic acid encoding the protein of SEQ ID NO: 292 or a 
fragment thereof may be administered to the individual. Alternatively, an agent which increases the 
5 activity of the protein of SEQ ID NO:292 may be administered to the individual. m Such agents may be 
identified by contacting the protein of SEQ ID NO: 292 or a cell or preparation containing the protein of 
SEQ ID NO:292 with a test agent and assaying whether the test agent increases the activity of the protein. 
For example, the test agent may be a chemical compound or a polypeptide or peptide. 

Alternatively, the activity of the protein of SEQ ID NO:292 may be decreased by administering 

10 an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:292 may be identified by contacting the protein of SEQ ID NO:292 or a cell or 
preparation containing the protein of SEQ ID NO:292 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 

1 5 forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify the source of a sample as, for example, 
fetal liver or fetal kidney, or to distinguish between two or more possible sources of a sample on the basis 
of the level of the protein of SEQ ID NO:292 in the sample. For example, the protein of SEQ ID NO:292 

20 or fragments thereof may be used to generate antibodies using any techniques known to those skilled in 
the art, including those described therein. Such antibodies may then be used to identify tissues of 
unknown origin, for example, forensic samples, differentiated tumor tissue that has metastasized to 
foreign bodily sites, or to differentiate different tissue types in a tissue cross-section using 
immunochemistry. In such methods a sample is contacted with the antibody, which may be detectably 

25 labeled, under conditions which facilitate antibody binding. The level of antibody binding to the test 
sample is measured and compared to the level of binding to control cells from fetal liver or fetal kidney or 
tissues other than fetal liver or fetal kidney to determine whether the test sample is from fetal liver or fetal 
kidney. Alternatively, the level of the protein of SEQ ID NO:292 in a test sample may be measured by 
determining the level of RNA encoding the protein of SEQ ID NO:292 in the test sample. RNA levels 

30 may be measured using nucleic acid arrays or using techniques such as in situ hybridization, Northern 
blots, dot blots or other technques familiar to those skilled in the art. If desired, an amplification reaction, 
such as a PCR reaction, may be performed on the nucleic acid sample prior to analysis. The level of RNA 
in the test sample is compared to RNA levels in control cells from fetal liver or fetal kidney or tissues 
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other than fetal liver or fetal kidney to determine whether the test sample is from fetal liver or fetal 
kidney. 

In another embodiment, antibodies to the protein of the invention or part thereof may be used for 
detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:292, including using 
5 methods known to those skilled in the art. For example, an antibody against the protein of SEQ ID 
NO:292 or a fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A 
prepartation containing cells expressing the protein of SEQ ID NO:292 is placed in contact with the 
antibody under conditions which facilitate binding to the antibody. The support is washed and then the 
cells are released from the support by contacting the support with agents which cause the cells to 
10 dissociate from the antibody. 

In another embodiment of the present invention, the protein of SEQ ID NO:292 or a fragment 
thereof thereof may be used to diagnose disorders associated with altered expression of the protein of 
SEQ ID NO:292. In such techniques, the level of the protein of SEQ ID NO:292 in an ill individual is 
measured using techniques such as those described herein. The level of the protein of SEQ ID NO:292 in 
15 the ill individual is compared to the level in normal individuals to determine whether the individual has a 
level of the protein of SEQ ID NO: 292 which is associated with disease. 

Protein of SEQ ID NO: 408 (internal designation 179-14-2-0-F1 1-CS 

The 236 amino acid protein of SEQ ID NO: 409, herein referred to as PNMT A, and encoded by 
the cDNA of SEQ ID NO: 168 is found in fetal kidney and fetal brain. PNMT A is a polymorphic variant 

20 of human phosphotidylethanolamine N-methyltransferase (PNMT) (SPTREMBLNEW SPTREMBL 
SWISSPROT accession number Q9UHY6). PNMT A differs from the sequence of PNMT (STR 
accession number Q9UHY6) by two amino residues. Position 95 contains an isoleucine residue (I) 
substituted for a valine residue (V); position 130 contains a valine residue (V) substituted for a glutamine 
residue (G). PNMT A displays 4 candidate membrane-spanning segments in positions 50 to 70, 83 to 103, 

25 131 to 151 and 196 to 216. 

Catecholamine neurotransmitters [e.g., dopamine, noradrenaline (norepinephrine), adrenaline 
(epinephrine)] are synthesized in catecholaminergic neurons from tyrosine, via dopa, dopamine and 
noradrenaline, to adrenaline. Four enzymes are involved in the biosynthesis of adrenaline: (1) tyrosine 3- 
mono-oxygenase (tyrosine hydroxylase, TH); (2) aromatic L-amino acid decarboxylase (AADC, or Dopa 

30 decarboxylase, DCC); (3) dopamine beta-mono-oxygenase (dopamine beta-hydroxylase, DBH); and (4) 
noradrenaline N-methyltransferase (phenylethanolamine N-methyltransferase, PNMT)(Nagatsu, 
Neurosci. Res. 12:315-345 (1991)). PNMT, the final enzyme in the pathway for adrenaline biosynthesis 
catalyses the production of adrenaline from noradrenaline using S-adenosyl-L-methionine as a methyl 
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donor. For this reason, PNMT serves as a good marker for tissues and cells producing epinephrine 
(adrenaline). Studies conducted by Kennedy and collaborators have shown that PNMT are widely 
distributed in human tissues including heart and kidney (Kennedy et al., J. Clin. Invest. 95:2896-2902 
(1995)). 

5 In some pheochromocytomas, the tumors contain and secrete greater amounts of adrenaline than 

do normal adrenal medullas. In a case/control study, Isobe et al have shown that adrenaline-secreting 
pheochromocytomas express significantly greater amounts of PNMT mRNA than do normal adrenal 
medullas (Isobe et al J. Urol. 163:357-362 (2000)). Moreover, PNMT immunoreactivity is only detected 
in the adrenaline-secreting tumors. The C-l region in the rostral ventral lateral medulla contains mainly 

10 adrenaline neurons. These neurons are the tonic vasomotor center of the brain. Burke et al have 
demonstrated changes in the enzymatic activity of PNMT in axon terminals and cell bodies of neurons 
from the medulla of patients with Alzheimer's disease. They have also shown that PNMT protein is 
decreased in axon terminals in brains from patients with Alzheimer's disease; the decrease in PNMT 
appears to be due to retrograde degeneration of epinephrine neurons (Burke et al., Ann. Neurol. 22:278- 

15 280 (1987)). In the case of advanced Alzheimer's disease, the Burke et al. presented evidence that the 
accumulation of PNMT in the perikarya results from diminished transport of this enzyme to axon 
terminals (Burke et al, J. Am. Geriatr. Soc. 38:1275-1282 (1990)). 

Neurons that contain PNMT have cell bodies in brain stem regions of the rat brain and send 
projections mainly into other brain stem areas, such as the hypothalamus and the spinal cord. These 

20 neurons can be affected pharmacologically by various kinds of drugs. PNMT inhibitors currently 
represent the only means of modifying adrenaline neurons pharmacologically without affecting 
noradrenaline or dopamine neurons in brain. Experiments conducted in deoxycorticosterone acetate-salt 
(DOCA-salt) hypertensive rats and spontaneously hypertensive rats (SHR) have shown that inhibitors of 
PNMT lower blood pressure (Goldstein et al, Life Sci. 30:1951-1957 (1982); Lyang et al, Res. 

25 Commun. Chem. Pathol. Pharmacol. 46:319-329 (1984); Chatelain et al, J. Pharmacol. Exp. Ther. 
252:117-125 (1990)). Molecules and compounds affecting adrenaline neurons may also be of use in the 
treatment of psychiatric disorders and neuroendocrine dysfunction. 

One embodiment of the subject invention provides polypeptides comprising the sequence of 
PNMT A. Other polypeptides of the invention include polypeptides comprising the amino acids of SEQ 

30 ID NO: 409 from positions 50 to 70, 83 to 103, 131 to 151 and/or 196 to 216. Also encompassed by the 
instant invention are biologically active fragments of the PNMT A protein. "Biologically active 
fragments" are defined as those peptide or polypeptide fragments of PNMT A which have at least one of 
the biological functions of the PNMT A protein (e.g., the ability to catalyze the formation of adrenaline). 
In a preferred embodiment, the biologically active fragment of PNMT A contains at least one of the 
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amino acid substitutions which distinguish PNMT A from PNMT {i.e., an isoleucine residue (I) 
substituted for a valine residue (V) at position 95; and/or valine residue (V) substituted for a glutamine 
residue (G) at position 130). In one embodiment, the PNMT A polypeptides of the invention are encoded 
by clone 179-1462-0-F11-CS. 
5 Thus, one embodiment of the invention provides an enzymatic component of the adrenaline 

synthetic pathway and methods of producing adrenaline in accordance with methods known to those 
skilled in the art. These methods substitute PNMT A, or biologically active fragments thereof, for the 
PNMT enzyme used in these known synthetic pathways. 

The invention also provides variants of the protein of SEQ ID NO: 409. These variants have at 

10 least about 80%, more preferably at least about 90%, and most preferably at least about 95% amino acid 
sequence identity to the amino acid sequence of PMNT A. Variants according to the subject invention 
also have at least one functional or structural characteristic of PNMT A, such as the ability to catalyze the 
formation of adrenaline. The invention also provides biologically active fragments of the variant proteins. 
Unless otherwise indicated, the methods disclosed herein may be practiced utilizing PNMT A or variants 

15 thereof. Likewise, the methods of the subject invention may be practiced using biologically fragments of 
PNMT A, or variants thereof, provides that said biologically active fragments contain the amino acid 
substitutions noted supra. 

One embodiment of the subject invention provides methods of using the protein of the invention, 
or biologically active fragments thereof, to label (chemically or isotopically) the adrenaline molecule in 

20 vitro. The labeled adrenaline molecules can then be used to localize receptors in tissue cuts by in situ 
hybridization experiments. 

The invention also provides a fusion protein or polypeptide in which PNMT A, or biologically 
active fragments thereof, are combined with another protein (tag) by the use of a recombinant DNA 
molecule. The resulting purified, and enzymatically active fusion product, is then added, in vitro, to the 

25 noradrenaline precursor and to S-adenosyl-L-methionine as a methyl donor. The enzymatic reaction is 
then performed in conditions known to those skilled in the art (Burke et al, Proc. Soc. Exp. Biol. Med. 
181:66-70 (1986); Morimoto et al., Endocr. J. 40:179-183 (1993)). In this reaction, the methyl group of 
S-adenosyl-L-methionine must be labeled isotopically ([14C]-S-adenosyl-L-methionine or [methyl-3H]- 
S-adenosylmethionine), or chemically, in order to allow the transfer of a "tagged" methyl group to the 

30 adrenaline molecule. 

Similarly, in cells transfected with cDNAs encoding the protein of the invention PNMT activity 
of expressed proteins may be measured by incubating cytosolic fractions with [14C]-S-adenosyl-L- 
methionine and normetanephrine for 60 min according to methods described by those skilled in the art 
(Morimoto et al., ibid.). Agonists and/or antagonists of PNMT activity may also be tested (high 
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throughput screening) on transfected cells expressing the wild type form of the protein of the invention. 
Again, effects of such drugs on PNMT enzymatic activity is measured by the methods described above. 

The invention further relates to methods and compositions used to modify the protein of the 
invention (i.e. derivatize the PNMT A protein). Post-translational modifications encompassed by the 
5 invention include, N-linked or O-linked carbohydrate chains, processing of N-terminal or C~terminal 
ends, attachment of chemical moieties, such as polyethylene glycol, to the amino acid backbone, chemical 
modifications of N-linked or O-linked carbohydrate chains, and addition or deletion of an N-terminal 
methionine residue as a result of prokaryotic host cell expression. Some of these modifications of the 
protein of the invention may facilitate its extraction and purification in prokaryotic expression systems. 
10 Post-translational modifications such as N-linked or O-linked carbohydrate chains addition may also 
optimize the enzymatic activity of the protein of the invention when it is first produced in a prokaryotic 
system. 

Another embodiment of the subject invention provides antibodies directed against the protein of 
the invention or immunogenic fragments thereof. The antibodies of the invention are useful for the 
15 screening of tissues and cells producing adrenaline or for affinity purification of PNMT or PNMT A. 
These antibodies may also be used in the diagnosis of pathologies and disorders such as 
pheochromocytomas and Alzheimer's disease, where PNMT A is overexpressed. Methods of peforming 
affinity purification as well as methods of making polyclonal and monoclonal antibodies are well known 
to those skilled in the art. 

20 In therapeutic regimens, neutralizing antibodies may be used as antagonists of PNMT A and used 

to treat conditions associated with overexpression of PNMT A. These disorders include, and are not 
limited to, hypertension, pheochromocytomas, and advanced Alzheimer's disease (Goldstein et al., Life 
Sci. 30:1951-1957 (1982); Lyang et al, Res. Commun. Chem. Pathol. Pharmacol. 46:319-329 (1984); 
Chatelain et al, J. Pharmacol. Exp. Ther. 252:1 17-125 (1990); Isobe et al., J. Urol. 163:357-362 (2000); 

25 Burke et a/., J. Am. Geriatr. Soc. 38:1275-1282 (1990)). 

Proteins of SEQ ID NOs: 395 and 403 (internal designation: 160-101-3-0-H2-CS and 160-99-4-0-E4-CS 
respectively 

The 367-amino-acid-long proteins of SEQ ID NOs: 395 and 403 encoded by the cDNAs of SEQ 
ID NOs: 154 and 162 respectively are polymorphic variants, the first one being overexpressed in fetal 
30 brain and ovary and the second one in fetal brain only. They both contain glutathine S-transferase (GST) 
domains from positions 47 to 122, and 206 to 309 which are respectively the G-site and H-site described 
below. In addition, they also display two hydrophobic domains (from aa 258 to aa 278 and from aa 338 to 
aa 358) which are characteristic of some GST proteins. 
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Glutathione S-transferase proteins (GSTs) are dimeric proteins that catalyse the conjugation of 
glutathione to a wide range of hydrophobic compounds (through the formation of a thioether bond with 
their electrophilic centre) to create the products which are less reactive, more hydrophilic, and thus more 
easily excreted from the cells. The GST superfamily (E.C. 2.5.1.18) is indeed believed to be one of the 
5 most important proteins in the detoxification of reactive electrophiles within living cells. Glutathione is a 
cellular tripeptide (gamma-glutamylcysteinylglycine) which is perhaps the most abundant amino acid 
derivative contained in the cells of higher life forms. The middle amino acid in glutathione, cysteine, has 
a free thiol group which can compete with the nucleophilic site on nucleotide bases for reaction with 
. electrophiles. Within the cell, glutathione functions so as to conjugate to xenobiotic toxic molecules in 

10 general, and electrophiles in particular, to render the toxic molecules less reactive against cellular 
macromolecules and to target the toxic molecules for subsequent metabolic and excretion pathways. 

Based on amino acid sequence identity, there are at least seven major classes of GST proteins 
(designated alpha, kappa, mu, pi, sigma, theta and zeta). Sequence similarity between classes is rather 
low, ranging between 20-30%. However, a single point mutation in the H-subsite region of GST is 

15 enough to shift substrate specificity from class pi to alpha (Nuccetelli M.N. et al. 
Biochem.Biophys.Res.Commun. 252: 184-189 (1998)). In spite of relatively low sequence identity, the 
GSTs exhibit a high degree of structural similarity. It is generally known that the GST molecule binds 
quite specifically and with high affinity to glutathione, but binds promiscuously to a wide variety of 
xenobiotic, electrophilic, and alkylating chemical agents. All GST enzymes of the four main cytosolic 

20 classes is found in dimeric form with two active sites per dimer each of which functions independently of 
the other. The active site has been characterized as consisting of a glutathione binding region (designated 
the G-site) and a non-specific hydrophobic binding region (designated the H-site) to accommodate the 
electrophilic substances. Pi-, mu-, alpha- and theta-class crystal structures have been elucidated; all 
possess a similar GSH-binding site, but the hydrophobic substrate-binding site (H-subsite) is subject to 

25 variation across the classes (Allardyce C.S. et al. BiochemJ. 343 525-531 (1999)). The GST activity has 
been suggested be involved in the regulation of the assembly of multisubunit complexes by shifting the 
balance between glutathione, disulfide glutathione, thiol groups of cysteines, and protein disulfide bonds. 
The GST domain is a widespread, conserved enzymatic module that may be covalently or noncovalently 
complexed with other proteins. Regulation of protein assembly and folding may be one of the functions of 

30 GST (Koonin EV et al. Protein Sci 3:2045-2054 (1994)). 

The cytosolic glutathione S-transferase are known to belong to four classes, designated Alpha, 
Mu, Pi and Theta. A fifth class of glutathione S-transferases is a microsomal enzyme found primarily in 
liver endoplasmic reticulum. An extensive analysis of the expression microsomal glutathione transferase 
1 in human tissues shows that it predominantly occurs in liver and pancreas. The relative expression 
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levels in man ranged from: liver and pancreas to kidney, prostate, colon (30-40%), heart, brain, lung, 
testis, ovary, small intestine (10-20%), placenta, skeletal muscle, spleen, thymus and peripheral blood 
leucocytes (1-10%). Liver-enriched expression was detected in human fetal tissues with lung and kidney 
displaying lower levels (10-20%). No transcripts could be detected in fetal brain or heart (Estonius M et 
5 al. Eur J Biochem 260:409-13 (1999)). Based on these observations, and the fact that the enzyme is 
encoded by a highly conserved single-copy gene, it is suggested that microsomal glutathione transferase 1 
performs essential functions vital to most mammalian cell types. One particular glutathione S-transferase 
was still identified in mitochondrial matrix (Pemble S.E. et al. Biochem.J. 319 : 749-754 (1996)). 

GST and GST-like proteins are largely spread in organisms. In vertebrates and in cephalopodes 

10 some proteins (christallins) presented in the lenses are structurally related to alpha-class GSTs (Chiou 
S.H. et al. Biochem.J. 309 : 793-800 (1995)). Furthermore, the olfactory epithelial cytosol shows the 
highest GST activity among the extrahepatic tissues. The olfactory GSTs were found to catalyse 
glutathione conjugation of several odorant classes, including many unsaturated aldehydes and ketones, as 
well as epoxides and were proposed to play an important role in chemoreception (Ben-Arie N. et al. 

15 Biochem.J. 292 : 379-384 (1993)). 

Higher cells each contain a family of many GST isozymes in each class with broad, yet 
overlapping, specificity. Mu-class GSTs are thought to be involved in the detoxification of reactive 
oxygen species (cyclised o-quinones) produced via oxidative metabolism of catecholamines. These toxins 
are thought to be involved in neurological disorders of the nigrostriatal and mesolimbic systems 

20 (Parkinsons and Schizophrenia, respectively). Enzymes of the mu-class GSTs are expressed in the 
substantia nigra and have preferential substrate specificity for the cyclised o-quinones formed by 
catecholamine metabolism (Hansson L.O.et al. J.Mol.Biol. 287: 265-276 (1999), Takahashi Y.et al. J. 
Biol. Chem.268: 8893-8 (1993)). Whilst most of the GSTs share common substrates, there are distinct 
differences in substrate preference between subfamilies. These enzymes have evolved as a cellular 

25 protection system against a wide variety of electrophilic compounds, including a range of xenobiotics, 
oxidative metabolism by-products (oxidized lipid, DNA and catechols), and in particular are known -to 
metabolise a number of environmental carcinogens. 

GSTs are also known to catalyze other reactions, such as peroxidase and isomerase reactions 
(Edwards R. et al. Trends in Plant Sci. 5 : 193-198 (2000)) as well as the addition of aliphatic epoxides 

30 and arene oxides to glutathione; the reduction of polyol nitrate by glutathione to polyol and nitrite; certain 
isomerization reactions and disulfide interchange. As well, there are marked species differences in 
catalytic activities between various purified mammalian hepatic GST mixtures. Some of them catalyse 
chemical stereospecific conversion of several pharmacological substances much less effective then 
anothers. For exemple, recombinant human GST was succesfully used in the reaction of steric conversion 
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of 13-cis-retinoic acid to all-trans-retinoic acid (Chen H. and Juchau M.R. BiochemJ. 336: 223-226 
(1998)). 

An increasing number of GST genes are being recognized as polymorphic. Certain alleles, 
particularly those that confer impaired catalytic activity may be associated with increased sensitivity to 
5 toxic compounds. Genetic polymorphisms and differences in GST expression have been implicated in 
individual susceptibility to certain types of cancer (for rev. Hayes JD and Strange RC Pharmacology 
61:154-166 (2000). For exemple, GSTM1 deficiency predisposes to head and neck cancer, especially to 
cancer of the larynx, which is particularly exposed to tobacco smoke carcinogens (Gronau S et al. 
Laryngorhinootologie 79:341-344 (2000)). Conversely, over- expression of GSTs is thought to be 

10 involved in the phenomenon of multi-drug resistance to cancer chemotherapy. One of the class of 
electrophilic compounds that are substrates for the glutathione S-transferase enzymes is the group of 
alkylating agents used in antineoplastic therapy. A common problem that is observed in modern cancer 
chemotherapy is the appearance of chemotherapeutic resistant tumor cells that, because of the resistivity, 
no longer respond appropriately to the antineoplastic agents. This resistance is often observed with many 

15 drugs that have no physical or mechanistic similarities to the original agent. GST isoenzymes have been 
shown to be involved in the development of drug resistance to a variety of chemotherapeutic agents such 
as adriamycin, vinblastine, actinomycin D and colchicine (Beckett, et al. Adv. Clin. Chem. 30:281-380 
(1993)). It has been demonstrated that a resistant population of malignant cells shows a modified pattern 
of total glutathione S-transferase activity. A resistant population of MCF-7 breast cancer cells, identified 

20 through selection in adriamycin by Batist et al., J. Biol. Chem., 261:15544-15549 (1986) resulted in a 
subset of cells which were approximately 200 fold more resistant than the parental cells. The resistant 
cells were found to exhibit a 45 fold increase in total glutathione S-transferase activity, the increase being 
due to the result of an appearance of an isozyme not expressed in the parental cell line. It was 
demonstrated that an increase in glutathione S-transferase alone, an increase conditioned by the 

25 transformation of susceptible cells with a foreign DNA construct expressing the wild-type glutathione S- 
transferase coding region, could increase the resistance of cells to an antineoplastic agent. As reported in 
Puchalski and Fahl, Proc. Natl. Acad. Sci. USA, 87:2443-2447 (1990), expression of the rat 1-1, 3-3 and 
the human Pl-1 isozymes of glutathione S-transferase in COS cells increased their resistance to the agent. 
The recent study of increase in resistance of tumor cells to cytotoxic drugs or ionizing radiation has 

30 allowed to identify using differential display a new GST-related protein p28 expressed exclusively in 
lymphoma cell (Kodym R. et al. J. Biol.Chem. 274: 511-5137 (1999)). Subcellular protein fractionation 
revealed p28 localization in the cytoplasm, but with thermal stress p28 relocated to the nuclear fraction of 
cellular proteins. The sequence homology and the similar functional characteristics of p28 to other GST 
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family members (in particular relocalization in response to thermal stress and ability to bind glutathione), 
argues that p28 is a new mammalian member of GST superfamily. 

Evidence suggests that the level of expression of GSTs is a crucial factor in determining the 
sensitivity of cells to a broad spectrum of toxic chemicals. In humans, marked interindividual differences 
5 exist in the expression of class alpha, mu and theta GST. For the most abundant mammalian classes of 
GST the mechanisms of transcriptional and post-translational regulation have been studied. The 
biological control of alpha-, mu- and pi- classes exhibit sex-, age-, tissue-, species-, and tumor-specific 
patterns of expression. In addition, GST are regulated by a structurally diverse range of xenobiotics and, 
to date, more then 100 chemicals have been identified that induce GST (Hayes J.D. and Pulford D.J. 

10 Crit.Rev.Biochem.Mol.Biol. 30 : 445-600 (1995)). A significant number of these chemicals occur 
naturally and, as they are found as nonnutrient components in vegetables and citrus fruits, it is apparent 
that humans are likely to be exposed regularly to such compounds. Many inducers effect transcriptional 
activation of GST genes through either the antioxidant-responsive element (ARE), the xenobiotic element 
(XRE), the GST P enhancer 1 (GPE), or the glucocorticoid-responsive element (GRE). Many of 

15 compounds that induce GST are themselves substrates for these enzymes, or are metabolized (by 
cytochrome P-450 monooxygenases) to compounds that can serve as GST substrates, suggesting that 
GST induction represent part of an adaptive response mechanism to chemical stress by electrophiles. It 
also appear probable that GST are regulated in vivo by reactive oxygen species, the potents inducers 
capable of generating free radicals by redox-cycling ; such relulation can be an adaptive response to 

20 oxydative stress in the cell. It has been shown GST-pi can potently and selectively inhibit activation of 
jun protein by its upstream kinase (JNK) ; these results suggest GST-pi can also be a regulator of signal 
transduction (Monaco R. et al. J.Prot.Chem. 18 : 859-866 (1999)). The majority of human tumors express 
significant amounts of class pi GST (Hayes&Pulford, supra). 

Therefore, GSTs have medical importance due to their role in mediating drug resistance in cancer 

25 patients. The measurement of GST isoenzymes in vitro has importance in diagnostic medicine. For 
example, the measurement of the pi isoenzyme of GST in tissue specimens is useful in pathology for the 
detection and diagnosis of a variety of different tumors. In addition, measurement of the alpha form of 
GST in blood is useful for the detection and monitoring of a variety of different forms of liver disease (for 
a detailed description of the clinical applications of GST measurements see Beckett, et ah, supra). 

30 It is believed that the proteins of SEQ ID NOs: 395 and 403 or part thereof are transferases, 

probably transferring alkyl or acyl groups different from methyl group, more probably glutathione S- 
transferases and, as such, play a role in cellular detoxification especially against xenobiotics and oxidative 
metabolism byproducts. Preferred polypeptides of the invention are polypeptides comprising the amino 
acids of SEQ ID NOs: 395 and 403 from positions 47 to 122, and 260 to 309. Other preferred 



206 



G-078US05DIV 



polypeptides of the invention are fragments of SEQ ED NOs: 395 and 403 having any of the biological 
activities described herein. The transferase activity of the proteins of the invention or part thereof may be 
assayed using any of the assays known to those skilled in the art including those described for GST 
proteins as in US patents 5,866,792 and 6,096,504, which disclosures are hereby incorporated by 
5 reference in their entireties. 

To find substrates, the proteins of the invention, or part thereof, or derivative thereof, may be used 
for screening libraries of compounds in any of a variety of drug screening techniques. The fragment 
employed in such screening may be free in solution, affixed to a solid support, borne on a cell surface, or 
located intracellularly. The formation of binding complexes, between the proteins of the invention, or part 

10 thereof, or derivative thereof, and the agent being tested, may be measured. Antagonists or inhibitors of 
the proteins of the invention may be produced using methods which are generally known in the art, 
including the screening of libraries of pharmaceutical agents to identify those which specifically bind the 
protein of the invention. Another technique for drug screening which may be used provides for high 
throughput screening of compounds having suitable binding affinity to the proteins of the invention as 

1 5 described in published PCT application WO84/03564. 

The invention relates to methods and compositions using the proteins of the invention or part 
thereof or derivative thereof to catalyze GST-dependent detoxification reactions in vitro or in vivo using 
any methods known to those skilled in the art. For example, uses of the proteins of the invention or part 
thereof may be very useful to treat toxic byproducts such as the ones obtained in laboratory experiments, 

20 such as dietary toxins due to the use of pesticides on plants used to feed animals or humans, etc... 
Preferably, the proteins of the invention or part thereof or derivative thereof is added to a sample 
containing the substrate(s) in conditions allowing detoxyfication, and allowed to catalyze the 
detoxification of the substrate(s). In a preferred embodiment, the detoxification is carried out using a 
standard assay such as those described herein. 

25 In some of the above cited embodiments, compositions comprising the proteins of the present 

invention or part thereof are added to samples as a "cocktail" with other detoxifying enzymes. The 
advantage of using a cocktail of detoxifying enzymes is that one is able to detoxify a wide range of 
substrates without knowing the specificity of any of the enzymes. Using a cocktail of detoxifying 
enzymes also protects a sample from a wide range of future unknown toxic compounds from a vast 

30 number of sources. For example, the proteins of the invention or part thereof is added to samples where 
toxic compounds are undesirable. Alternatively, the protein of the invention or part thereof may be bound 
to a chromatographic support, either alone or in combination with other detoxifying enzymes, using 
techniques well known in the art, to form an affinity chromatography column. A sample containing the 
undesirable substrate is run through the column to remove the substrate. Immobilizing the proteins of the 
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invention or part thereof on a support is particularly advantageous for those embodiments in which the 
method is to be practiced on a commercial scale. This immobilization facilitates the removal of the 
enzyme from the batch of product and subsequent reuse of the enzyme. Immobilization of the protein of 
the invention or part thereof can be accomplished, for example, by inserting a cellulose-binding domain in 
5 the protein. One of skill in the art will understand that other methods of immobilization could also be used 
and are described in the available literature. Alternatively, the same methods may be used to identify new 
substrates. 

In a preferred embodiment, the invention relates to cells and plants or animals genetically 
engineered to express the protein of the invention or part thereof, preferably at a high level using any 

10 method known to those skilled in the art. Such engineered cells, animals or plants will display enhances 
detoxification of compounds. In a more preferred embodiment, expression of the proteins of the 
invention or part thereof will confer resistance to herbicides to transgenic plants using techniques similar 
to those described in the US patent 5,866,792. 

For such embodiments, the proteins of the invention may need to be modified to enhance their 

15 ability to react with specific substrates. These modifications can provide novel isoforms which are 
specifically efficient against selected electrophilic or alkylating agents. Artificial DNA constructs 
encoding and expressing such modified or mutant proteins of the invention or part thereof may be 
selectively delivered into targeted cells to enhance the resistivity of those cells to the alkylating or 
neoplastic agents. The methods related to such modifications are described (Fahl , et al. United States 

20 Patent 6,136,605 Oct24, 2000). The method is based on random mutation and selection with the selection 
being performed with the agent against which enhanced activity is sought. The mutation is preferably site 
directed to the amino acids associated with the H-site on the enzyme, so as to favor the creation of new, 
useful isoforms of the enzyme. 

In another embodiment, the invention relates to compositions and methods using the proteins of 

25 the invention or part thereof to design specific systems of artificial chemoreception as described in Ben- 
Arie N. et al, supra. Such chemoreception systems could recognize odorants, xenobiotics, pesticides, 
drugs and may be useful for chemical, cosmetic, pharmaceutical, forensic and any other analytical 
purposes. The design of such a system may be generally based on the subtle specificity of recognition of 
compounds by different GST isoenzymes. The methods to produce analytical diagnostics based on 

30 enzyme specificity are known by those skilled in the art. 

In another embodiment, the invention relates to compositions and methods using the proteins of 
the invention or part thereof such as ligands for substrates of interest. In a preferred embodiment, the 
proteins of the invnetion or part thereof may be used to identify and/or quantify substrates using any 
techniques known to those skilled in the art such as those described in Koonin E., supra. In another 
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preferred embodiment, the proteins of the invention or part thereof may be used to improve or to modify 
some molecular biology methods based on protein-protein interactions, including but not limited to two- 
hybrid assays, expression and purification systems based on GST fusion to heterologous proteins as 
already available commercially (expression vectors or plasmids encoding fusions proteins and affinity 
5 purification methods). 

In still another embodiment, the invention relates to methods and compositions using the proteins 
of the invention or part thereof as a marker protein to selectively identify tissues, preferably fetal brain for 
the protein of SEQ ID NO: 403, and preferably fetal brain and ovary for the protein of SEQ ID NO: 395. 
For example, the protein of the invention or part may be used to synthesize specific antibodies using any 

10 techniques known to those skilled in the art including those described therein. Such tissue-specific 
antibodies may then be used to identify tissues of unknown origin, for example, forensic samples, 
differentiated tumor tissue that has metastasized to foreign bodily sites, or to differentiate different tissue 
types in a tissue cross-section using immunochemistry. 

In another embodiment of the invention, measurement of the activity or expression of the proteins 

15 of the invention, may be used for the assessment of organ status, including organ damage following 
immunological or toxological insult and diagnostic of transplant rejection, using any technique known to 
those skilled in the art including those described in US patents 6,080,55 1 and RE35,419. 

In another embodiment of the invention, the proteins of the invention or part thereof, or derivative 
thereof, may be used to diagnose, treat and/or prevent cell proliferative disorders linked to dysregulation 

20 of gene expression of the proteins of the invention. Such disorders include but are not limited to, benign 
tumors, and cancers such as adenocarcinoma; leukemia; melanoma; lymphoma; sarcoma; and cancers of 
the brain, ovary, bladder, colon, liver, small intestine, large intestine, breast, kidney, lung, and prostate. 
Diagnosis may be performed using nucleic acids or antibodies able to detect the expression of the protein 
of the invention using any technique known to those skilled in the art including Northern blotting, RT- 

25 PCR, immunoblotting methods immunohistochemisty, enzyme-linked immunosorbant assay (ELISA) 
described herein. Quantities of the protein of the invention expressed in subject samples, control and 
disease from biopsied tissues or body fluids or cell extracts taken from patients are compared with the 
standard values. Deviation between standard and subject values establishes the parameters for diagnosing 
disease. 

30 For prevention and/or treatment purposes, the expression of the proteins of the invention may be 

enhanced using any methods known to those skilled in the art. For example, gene therapy techniques may 
be used such as the delivery of sense promoter polynucleotide constructs for the proteins of the invention 
or part thereof using a recombinant expression vector such as a chimeric virus or a colloidal dispersion 
system (see Nelson et al. United States Patent 552,277). Alternatively, the proteins of the invention or 
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fragments thereof or derivatives thereof may be administered to a subject to treat or prevent cancerous 
and precancerous disorders as well as a proliferative disorders in general. Such disorders can include, but 
are not limited to, syndromes represented by abnormal neoplastic, including dysplastic, changes of tissue, 
dysplastic growths in ovary, brain, colonic, breast, prostate or lung tissues, dysplastic nevus syndromes, 
5 polyposis syndromes, colonic polyps, precancerous lesions of the cervix (i.e., cervical dysplasia), 
esophagus, lung, prostatic dysplasia, prostatic intraneoplasia, breast and/or skin and related conditions 
(e.g., actinic keratosis), whether the lesions are clinically identifiable or not. 

The invention also relates to compositions and methods using the proteins of the invention or part 
thereof or derivative thereof to decrease drug resistance in cancer chemotherapy. Inhibition of the 

10 expression and/or activity of the proteins of the invnetion may be achieved using any mean known to 
those skilled in the art. In a preferred embodiment,gene therapy methods such as antisense 
oligonucleotides, triple helices strategies are described elsewhere in the application. In another preferred 
embodiment, antagonists of the activity of the proteins of the invention may be used. These antagonists 
may be directly administered to patients. Low-molecular-weight inhibitors (i.e., those which can be 

15 delivered freely into the brain and which specifically inhibit GST activity) are especially preferred for use 
in cancer therapy. Alternatively, artificial DNA constructs encoding peptide modulators or inhibitors of 
the activity of the protein of the invention and flanking sequences effective to express the protein coding 
sequence in a host cell as well as flanking regulatory sequences (such as an antioxidant responsive 
element which enhances the expression of the glutathione S-transferase in the presence of antioxidant 

20 molecules) may be used. Such artificial DNA constructs confers to recombinant cells an increased level of 
resistance to an antineoplastic agent. 

There is a need for selective inhibitors of GST isoenzymes for treatment of drug resistance in 
cancer patients. Thus, in a further embodiment of this invention, the proteins of the invention or 
fragments thereof can be used for screening of the compounds which are selective inhibitors or specific 

25 inhibitors of one or more GST isoenzymes. Selective inhibition means that a compound has a greater 
inhibitory effect on one isoenzyme than it does on another GST isoenzyme. Such compounds could also 
be tested and selected for their ability to overcome drug resistance to chemotherapeutic agents (see Jones , 
et al. United States Patent 6,103,665 Aug 2000). For example, mammalian cell lines that have been made 
resistant to particular chemotherapeutic drugs can be used to identify haloenol lactone compounds that 

30 render the lines sensitive to the chemotherapeutic agents. Such cell lines are known to those of skill in the 
art and can be obtained for example from the American Type Culture Collection, Rockville, Md., USA. 
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Protein of Seq Id No: GRP 087-3 8-0-0-1 10-CS) 

The protein of SEQ ID No: 306, herein referred as GRP, encoded by the cDNA of SEQ ID No: 65 
herein referred as GRP2, is homologous to bovine glutamic-acid rich protein (GARP) (GENEPEPT ID: 
M61 185). The protein of the invention is overexpressed in the brain and fetal brain, lymph ganglia and 
5 thyroid. 

The protein of the invention exhibits homology with bovine glutamic-acid rich protein (GARP) 
(18 % identical amino acids, 28% positive amino acids when aligned by BLASTP 2.0.9). 

GARP proteins have been identified as multivalent proteins that interact the key players of cGMP 
signaling, phosphodiesterase and guanylate cyclase. GARP proteins are closely associated to cyclic 

10 nucleoti de-gated channels (CNGs) which make up a family of nonselective cation channels found in a 
variety of tissues. The beta subunit of CNGs have a unique bipartite structure, containing a membrane- 
spanning region (beta part) and a GARP part (GARP). GARP is highly homologous to a soluble splice 
form, GARP1, and a splice variant lacking the C-terminal glutamic-acid-rich region. Experiments using 
GARP attached to affinity columns showed that phosphodiesterases are highly retained by the column 

15 [Korschen HG et al., Nature, 400(6746) :76 1-766 (1999)]. Moreover, Korschen et al. demonstrated that 
GARP inhibits both soluble and membrane bound phosphodiesterase. 

Cyclic nucleotides are involved in regulating the activity of airway smooth muscle and many 
other cells in the airways, including pro-inflammatory, immunocompetent cells such as macrophages, 
eosinophils, mast cells and lymphocytes. Cyclic nucleotides are inactivated by the action of cyclic 

20 nucleotide phosphodiesterase enzymes (PDE). Inhibition of cGMP PDE results in elevation of cGMP 
levels; elevated cGMP levels are associated with beneficial anti-platelet, anti-neutrophil, anti-vasospastic 
and vasodilatory activity. 

Thus, the subject invention provides a polypeptide having the sequence of SEQ ID No: 306 or a 
GRP polypeptide encoded by the human cDNA of clone 187-38-0-0-110. In a preferred embodiment, 

25 GRP is encoded by the sequence of SEQ ID No: 65 or the human cDNA of clone 187-38-0-0-110, 
however, all polynucleotides encoding the polypeptides of the invention are included. As used herein, 
"the GRP protein" includes the full length protein of SEQ ID NO: 306 as well as biologically active 
fragments of the GRP protein. Also encompassed by the phrase "the GRP protein" are variants of the 
protein of SEQ ID NO: 306 and biologically active fragments of said variant proteins. 

30 "Biologically active fragments" are defined as those peptide or polypeptide fragments of GRP 

which have at least one of the biological functions of the full length protein (e.g., the ability to inhibit the 
activity of PDE or serve as an affinity substrate for PDEs). 

The invention also provides variants of the protein of the GRP protein encoded by SEQ ID NO: 
306. These variants have at least about 80%, more preferably at least about 90%, and most preferably at 
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least about 95% amino acid sequence identity to the amino acid sequence of GRP. Variants according to 
the subject invention also have at least one functional or structural characteristic of GRP, such as the 
ability to inhibit the activity of PDE or serve as an affinity substrate for PDEs. The invention also 
provides biologically active fragments of the variant proteins. Unless otherwise indicated, the methods 
5 disclosed herein can be practiced utilizing GRP or variants thereof. Likewise, the methods of the subject 
invention can be practiced using biologically active fragments of GRP, or biologically active fragments of 
GRP variants. 

Assays related to the inhibitory effect of the protein of the invention can be carried out using 
techniques described in U.S. Patent No. 6,130,333, hereby incorporated by reference in its entirety; or by 
10 any other technique known to those skilled in the art. 

One aspect of the subject invention provides compositions and methods of using the nucleotide 
sequence of SEQ ID NO: 65, or its complement, in molecular biology techniques. These techniques 
include, but are not limited to: the use of segments of GRP2 as oligomers for PCR; expression of the 
GRP2 and the production of recombinant proteins; in generation of antisense RNA and DNA, their 
15 chemical analogs and the like; the use of GRP2 segments as hybridization probes and in chromosome 
gene mapping. 

For example, nucleotide sequence of SEQ ID No: 65, or its complement, can be used to generate 
hybridization probes for mapping the naturally occurring genomic sequence. The sequence can be 
mapped to a particular chromosome or to a specific region of the chromosome using well known 
20 techniques. These include in situ hybridization to chromosomal spreads, flow-sorted chromosomal 
preparations, or artificial chromosome constructions such as yeast artificial chromosomes, bacterial 
artificial chromosomes, bacterial PI constructions, or single chromosome cDNA libraries as reviewed in 
Price (Price CM - Blood Rev. - 1993, 7(2):127-34) and Trask B (Trask BJ - Trends Genet. - 1991, 
7(5): 149-54). 

25 In situ hybridization of chromosomal preparations and physical mapping techniques, such as 

linkage analysis using established chromosomal markers, are invaluable in extending genetic maps; 
genetic maps provide valuable information to investigators searching for disease-causing genes using 
positional cloning or other gene discovery techniques. The nucleotide sequence of the present invention 
can also be used to detect differences in the chromosomal location due to translocation, inversion, etc. 

30 among normal, carrier or affected individuals. 

Another embodiment of the subject invention provides pharmaceutical compositions comprising 
the GRP protein and pharmaceutically acceptable carriers. These pharmaceutical compositions can be 
used in prophylaxis and/or treatment of a variety of conditions where inhibition of phosphodiesterase is 
considered to be beneficial. The biochemical, physiological, and clinical effects of phosphodiesterases 
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inhibitors suggest their utility in a variety of disease states in which modulation of smooth muscle, renal, 
hemostatic, inflammatory, and/or endocrine function is desirable. Therefore, the GRP protein can be used 
for the treatment or prophylaxis of a number of disorders and conditions including, but not limited to, 
stable, unstable, and variant (Prinzmetal) angina; hypertension; pulmonary hypertension; congestive heart 
5 failure; acute respiratory distress syndrome; acute and chronic renal failure; atherosclerosis; conditions of 
reduced blood vessel patency (e.g., postpercutaneous transluminal coronary or carotid angioplasty, or 
post-bypass surgery graft stenosis); peripheral vascular disease; vascular disorders, such as Raynaud's 
disease, thrombocythemia, intermittent claudication; immune diseases, multiple sclerosis; cancers 
inflammatory diseases, graft versus host disease, Alzheimer's disease, memory deficits, , stroke, 
10 bronchitis, chronic asthma, acute lung injury, chronic obstructive pulmonary disease, allergic asthma, 
allergic rhinitis; glaucoma; osteoporosis; preterm labor; benign prostatic hypertrophy; male and female 
erectile dysfunction; and diseases characterized by disorders of gut motility (e.g., irritable bowel 
syndrome). 

The GRP protein of the invention can also provide beneficial anti-platelet, anti-neutrophil, anti- 
15 vasospastic, vasodilatory, natriuretic, and diuretic activities when administered in therapeutically effective 

amounts. The GRP protein can also potentiate the effects of endothelium-derived relaxing factor (EDRF), 

gastric NO administration, nitrovasodilators, atrial natriuretic factor (ANF), brain natriuretic peptide 

(BNP), C-type natriuretic peptide (CNP), and endothelium-dependent relaxing agents such as bradykinin 

and acetylcholine, when administered to an individual. 
20 Another embodiment of the subject provides methods of treating male erectile dysfunction 

comprising the administration of therapeutically effective amounts of the GRP protein using appropriate 

methods known to the skilled artisan. 

Another embodiment of the subject invention provides industrially significant methods of 

recovering PDE comprising contacting solutions containing PDE with immobilized GRP protein. In this 
25 aspect of the invention, the GRP protein is immobilized onto a solid support and allowed to specifically 

bind to PDE contained in a solution or sample. PDE can then be eluted from the immobilized GRP 

protein according to methods known to the skilled artisan (see, for example, Korschen et al., supra). PDE 

is a commercially valuable commodity sold by various vendors. 

Protein of SEP ID NO: 302 (internal designation 187-2-2-0-A3-CS) 

30 The protein of SEQ ID NO: 302 encoded by the cDNA of SEQ ID No: 61 is related to a 

neuronally expressed protein (neuritin, Genseq accession number W37859) known to have a role in 
neurogenesis and axonal and dendritic growth. 



213 



G-078US05DIV 



The 164 amino acid protein of SEQ ID NO: 302 is 24% identical to neuritin over the complete 
sequence. Specifically, SEQ ID NO: 302 displays two blocks of strong homology to neuritin (amino 
acids 41-60 of SEQ ID NO: 302 display 55% identity and 95% similarity to amino acids 30-49 of 
neuritin, and amino acids 66-1 17 of SEQ ID NO: 302 display 32% identity and 57% similarity to amino 
5 acids 62-113 of neuritin). The C-terminal portion of neuritin (aa 116-142 of neuritin) is highly 
hydrophobic and contains a cleavage site found in GPI-anchored proteins. The protein of SEQ ID 
NO: 302 also has a hydrophobic C-terminus (21 out of the last 30 amino acids are hydrophobic) and 
conforms to the GPI anchor consensus sequence. 

Neuritin, also known as candidate plasticity-related gene number 15 (cpg-15), was independently 

10 identified by two groups from differential cDNA libraries generated from kainic acid-treated hippocampal 
cells (Nedivi et al., Nature. 363:718-22 (1993); Naeve et al., Proc. Natl. Acad. Sci. USA. 94:2648-53 
(1997)). Neuritin is a secreted protein that contains a potential GPI anchoring domain believed to anchor 
the protein to the membranes of target cells. Neuritin is expressed strongly in the brain, and in particular, 
in systems with pronounced developmental plasticity, including the pyramidal neurons of the cornus 

15 ammons and the granule cells of the hippocampus dentate gyrus. Strong expression is also observed in 
layers of tenia tecta projecting to the olfactory bulb, the major target of the retinal ganglion cells, and the 
optical nerve layer of the superior colliculus (optic tectum); and localized expression is observed in the 
thalamic nuclei and the cerebral cortex (Nedivi et al., Pro. Natl. Acad. Sci. USA. 93:2048-53 (1996); 
Naeve et al., Proc. Natl. Acad. Sci. USA. 94:2648-53 (1997); Nedivi et al., Science. 281:1863-66 (1998)). 

20 mRNA is expressed throughout development and persists into adulthood. In addition, neuritin expression 
is upregulated in adults by brain derived neurotrophic factor (BDNF). Neuritin mRNA is also detected in 
the lung and the liver, although at lower levels than that observed in the CNS (Naeve et al., Proc. Natl. 
Acad. Sci. USA. 94:2648-53 (1997)). 

Functional studies on the neuritin protein have revealed a role in neuronal growth. In one such 

25 study, rat cortical and hippocampal neurons were treated with recombinant forms of neuritin. Neurons 
treated with neuritin showed extensive neuritogenesis over control cultures. Specifically, neurons showed 
well-differentiated cell bodies with well-defined extensions after treatment with neuritin (Naeve et al., 
Proc. Natl. Acad. Sci. USA. 94:2648-53 (1997)). Other studies using. frog optic tectum showed that 
transfection of tectum cells with neuritin cDNA can increase the growth rate of tectal cell dendrites 

30 (Nedivi et al., Science. 281:1863-66 (1998)). Studies have also shown that neuritin can modify the 
growth of retinotectal axons by increasing the elaboration of presynaptic axons and can promote the 
maturation of retinal tectal synapses (Cantallops et al., Nature Neuroscience. 3:1004-1011 (2000)). 
Together, these results indicate that neuritin promotes the growth of pre- and post-synaptic neurons and 
contributes to the formation and stabilization of mature synapses. 
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The subject invention provides the polypeptide of SEQ ID NO: 302 and polynucleotide sequences 
encoding the amino acid sequence of SEQ ID NO: 302. In one embodiment, the polypeptides of SEQ ID 
NO: 302, including fragments, variants, etc. are replaced by the corresponding polypeptide encoded by 
the human cDNA of clone 187-2-2-0-A3-CS. Also included in the invention are biologically active 
5 fragments of the protein of SEQ ED NO: 302 and polynucleotide sequences encoding these biologically 
active fragments. In another embodiment, biologically active fragments comprise amino acid positions 
41-60, 66-1 17, 41-1 17, and 41-164. In another embodiment, these fragments may be joined together by 
chemical linkers or by recombinantly inserted amino acid linker segments according to methods known in 
the art. "Biologically active fragments" are defined as those peptide or polypeptide fragments of SEQ ID 

10 NO: 302 which have at least one of the biological functions of the full length protein (e.g., the ability to 
stimulate neuritigenesis and axonal and dendritic growth). 

The invention also provides variants of SEQ ID NO: 302. These variants have at least about 
80%, more preferably at least about 90%, and most preferably at least about 95% amino acid sequence 
identity to the amino acid sequence of SEQ ID NO: 302. Variants according to the subject invention also 

15 have at least one functional or structural characteristic of SEQ ID NO: 302, such as the biological 
functions described above. The invention also provides biologically active fragments of the variant 
proteins. Unless otherwise indicated, the methods disclosed herein can be practiced utilizing the 
polypeptide of SEQ ID NO: 302 or variants thereof. Likewise, the methods of the subject invention can 
be practiced using biological fragments of the protein of SEQ ID NO: 302 or variants of said biologically 

20 active fragments. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 
SEQ ID NO: 302. It is well within the skill of a person trained in the art to create these alternative DNA 
sequences, which encode proteins having the same, or essentially the same, amino acid sequence. These 
variant DNA sequences are, thus, within the scope of the subject invention. As used herein, reference to 
25 "essentially the same sequence" refers to sequences that have amino acid substitutions, deletions, 
additions, or insertions that do not materially affect biological activity. Fragments retaining one or more 
characteristic biological activity of SEQ ID NO: 302 are also included in this definition. 

"Recombinant nucleotide variants" are alternate polynucleotides which encode a particular 
protein. They can be synthesized, for example, by making use of the "redundancy" in the genetic code. 
30 Various codon substitutions, such as the silent changes which produce specific restriction sites or codon 
usage-specific mutations, can be introduced to optimize cloning into a plasmid or viral vector or 
expression in a particular prokaryotic or eukaryotic host system, respectively. 

The protein of SEQ ID NO: 302, and variants thereof, can be used to produce antibodies 
according to methods well known in the art. The antibodies can be monoclonal or polyclonal. Antibodies 
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can also be synthesized against immunogenic fragments of SEQ ID NO: 302, as well as variants thereof, 
according to known methods. The subject invention also provides antibodies which specifically bind to 
biologically active fragments of SEQ ID NO: 302 or biologically active fragments of SEQ ID NO: 302 
variants. 

5 The protein of SEQ ID NO: 302 can be utilized to treat diseases and disorders of the central or 

peripheral nervous system which arise from alterations in the pattern of expression of the protein of SEQ 
ID NO: 302. In this aspect of the subject invention, compositions comprising the protein of SEQ ID NO: 
302 and a pharmaceutical carrier are administered to an individual in need thereof. Alternatively, in cases 
where the protein of SEQ ID NO: 302 is overexpressed, reductions in SEQ ID NO: 302 levels may be 

10 accomplished by a variety of methods known to those of skill in the art. These methods include the 
introduction of neutralizing antibodies or the use of antisense polynucleotides derived from the protein of 
SEQ ID NO: 302 or clone 187-2-2-0-A3-CS. 

The subject invention also provides materials and methods for the treatment of neurological 
disorders comprising contacting neuronal cells with compositions comprising the protein of SEQ ID NO: 

15 302 and pharmaceutically acceptable carriers. Thus, this aspect of the invention provides methods of 
treating patients suffering from a variety of neurological disorders, conditions, and/or diseases of the 
central, autonomic, or peripheral nervous system. These include neurological damage arising from 
congenital disease, trauma, surgery, stroke, ischemia, infection, metabolic disease, nutritional deficiency, 
malignancy, and/or exposure to toxic agents. Additional examples of such disorders include, but are not 

20 limited to, epilepsy, cerebral neutralisms, Alzheimer's disease, Pick's disease, Huntington's disease, 
dementia, Parkinson's disease and other extrapyramidal disorders, amyotrophic lateral sclerosis and other 
motor neuron disorders, progressive neural muscular atrophy, retinitis pigmentosa, hereditary ataxias, 
multiple sclerosis and other demyelinating diseases, bacterial and viral meningitis, brain abscess, subdural 
empyema, epidural abscess, auppurative intracranial thrombophlebitis, myelitis and radiculitis, viral 

25 central nervous system disease, prion diseases, Creutzfeldt-Jakob disease, Gerstmann-Staussler-Scheinker 
syndrome, fatal familial insomnia, diabetes induced peripheral neuropathy or neuropathy induced by other 
metabolic disorders or nutritional deficiencies, neurofibromatosis, tuberous sclerosis, cerebelloretinal 
hemangioblastomatosis, encephalotrigeminal syndrome, mental retardation and other developmental 
disorders of the central nervous system, cerebral palsy, neuroskeletal disorders, autonomic nervous 

30 system disorders, cranial nerve disorders, spinal cord diseases, muscular dystrophy and other 
neuromuscular disorders, dermatomyositis and polymyostis, inherited, metabolic, endocrine, and toxic 
myopathies, myasthenia gravis, periodic paralysis, mental disorders including mood, anxiety, and 
schizophrenic disorders, seasonal affective disorder, akathesia, amnesia, and/or other dystrophies or 
degenerative disorders of the visual, sensory, olfactory, auditory, motor, or memory systems. Methods of 
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introducing therapeutic compounds into cells are known to those skilled in the art. Non-limiting 
examples include the use of targeted liposomes, fusogenic liposomes, or other carriers suitable for the 
introduction of a therapeutic compound into a target cell. 

The subject invention also provides materials and methods for the treatment of neurological 
5 disorders comprising contacting neuronal cells with compositions comprising polynucleotides encoding 
the protein of SEQ ID NO: 302 and pharmaceutical^ acceptable carriers. In one embodiment, the 
polynucleotide is clone 187-2-2-0-A3-CS. Methods of introducing polynucleotides into cells and 
directing expression of the polynucleotide are known to those skilled in the art. 

Antibodies raised against the protein of SEQ ID NO: 302 may be used in a variety of 
10 immunoassays known to those skilled in the art. In this aspect of the invention, immunoassay screening 
for abnormal levels of the protein of SEQ ID NO: 302 can be used as screens or diagnostic/prognostic 
indicators of neurodegenerative disease. 

Antibodies raised against the protein of SEQ ID NO: 302, fragments, and/or derivatives thereof 
may also be used for detection and identification of growing and differentiating neurons including, but not 
15 limited to, the pyramidal neurons of the cornus ammons, the granule cells of the hippocampus dentate, 
gyrus, neurons in layers of tenia tecta projecting to the olfactory bulb, the optical nerve layers of the 
superior colliculus (optic tectum), and neurons of the thalamic nuclei and the cerebral cortex. 

Protein of SEP ID NO:301 O87-12-4-0-A8-CS) 

The protein of SEQ ID No:301, encoded by the cDNA of SEQ ID NO:60, is homologous to the 
20 Eukaryotic cell growth inhibiting factor (GENESEQP: R95950) described in patent W09617933. The 
protein of the invention is highly expressed in the brain, fetal brain, fetal liver and the prostate. 

It is believed that the protein of the invention is a cell growth inhibiting factor. Preferred 
polypeptides of the invention are those that comprise amino acids 221 to 287. Other preferred 
polypeptides of the invention are any fragment of SEQ ID NO: 301 having any of the biological activities 
25 described herein. In the present invention, a cell inhibiting factor is defined as a peptide or protein that 
decreases, suppresses or terminates (reversibly or irreversibly) the growth of at least one type of cell such 
as, but not limited to, bacteria, yeast, vertebrate cells, mammalian cells and human cells, under ordinary 
culturing conditions known to those skilled in the art. Assay of the inhibiting activity of the invention can 
be carried out, for example, by evaluating the decrease in DNA synthesis as described in Patent WO 
30 96/17933, or by measuring the number or density of cells using any standard method. For example, 
fibroblasts are transfected with a vector containing the DNA sequence coding for the protein of the 
invention or part thereof. Cells are then cultured in a standard medium, exposed to tritiated thymidine, 
and further cultured. The cultures are then fixed and stained with X-Gal, the blue stained galactosidase- 
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expressing cells are counted under a microscope, and the ratio of cells showing dark particles in their 
nuclei due to tritiated thymidine uptake is determined. DNA synthesis inhibitory rates are calculated with 
the labeling index taking for reference (i.e. 0% inhibition) a culture of cells tranfected with a "blank" 
vector (i.e. not modified to contain the DNA coding for the protein of the invention or part thereof). 
5 Aging at the cell level is associated with individual aging. The maximum possible number of 

divisions (division life span) of cultured cells is inversely proportional to individual age. Even if an aged 
cell is fused with a young or immortalized cell, DNA synthesis does not occur again in aged cells; on the 
contrary, DNA synthesis in the young and immortalized cell is suppressed (Stein GH et al. - Proc Nat 
Acad Sci. - 1981, 78:p3025). This demonstrates that certain factors controlling cellular senescence are 

10 dominant, and that aged cells not only lack substances essential for their growth but also have substances 
that actively suppress DNA synthesis. Moreover, microinjections of mRNA, prepared from an aged cell, 
are known to inhibit DNA synthesis (Lumpkin CK et al. - Science - 1986, 232:p393). Therefore as cells 
age, there are some genes that are newly expressed or whose expression is increased. Such genes play an 
important role, directly or indirectly, in cell aging. 

15 Pereira-Smith et al. tested the complementation of a large number of immortalized human cells in 

fused pairs and demonstrated the presence of 4 groups of human aging genes (Pereira-Smith OM et al. - 
Proc Nat Acad Sci. - 1988, 85:p6042). Clarifying the nature of aging-associated genes is not only 
important in understanding aging, both at the cellular and individual levels, but is also significant in that 
the use of these genes or gene products would enable the diagnosis of various aging-associated diseases 

20 and diseases caused by cellular senescence, the development of prophylactic/therapeutic drugs for such 
diseases, and their application as prophylactic/therapeutic drugs for various diseases involving 
uncontrolled cell growth such as, but not limited to, cancers. 

In one embodiment of the present invention, the polypeptides and polynucleotides of the 
invention are used to specifically label cells of the brain, fetal brain, fetal liver and the prostate, as the 

25 protein is strongly expressed in these tissues. The ability to specifically detect these tissues, and cells 
derived from these tissues, has a number of uses, including for the determination of the history of tumor 
cells and for histological analyses. 

An embodiment of the present invention invention relates to methods and compositions of using 
the protein of SEQ ID NO:301 or the cDNA of SEQ ID NO:60 or any part thereof, to inhibit cell 

30 proliferation in vitro. For example, by including the invention in a "cocktail" with other proteins (such as 
proteases) it could be used as a decontaminant, i.e. to prevent the growth of any cells to maintain a sterile 
environment. Preferred applications of this embodiment include decontamination of samples (such as cell 
culture media) and instruments (such as surgical instruments), where the invention would be used as a 
bacteriostatic/mycostatic agent. Another example pertains to the use of the protein of the invention as a 
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reagent for terminating the cell cycle of cultured cells at a given time point, e.g., as a reagent for 
synchronizing cell division, avoiding the need to isolate specific cells (e.g. at the desired cell cycle phase) 
in cultures using techniques such as flow cytometry. Synchronization of the cell cycle could, for 
example, be achieved, e.g., by transfecting cells with an appropriate vector containing the DNA coding 
5 for the protein of the invention or part thereof, where expression of the protein results in growth 
inhibition. Then, after a certain time, an inhibitor of the protein could be administered in order to enable 
the cells to resume growth (e.g. all at the S phase, when DNA is synthesized). Further, the ability to 
synchronize the cell cycle in an in vitro experimental system would provide improved assay precision or 
would facilitate any laboratory procedure or experiment involving a particular cell cycle stage. Use of the 

10 invention for in vitro inhibition of cell proliferation is not limited to the above examples; the invention is 
potentially useful in any in vitro application that requires the inhibition of cellular proliferation. 

Another embodiment of the invention pertains to the introduction of SEQ ID NO:301 or SEQ ID 
NO:60, or any part thereof, into a target tissue (such as skin or vascular endothelium) to establish an in 
vitro aged cell line of the target tissue. Such a cell line is useful as a screening system for clarifying the 

15 mechanisms of aging and/or cellular senescence, but also for seeking prophylactic and therapeutic drugs 
for aging-associated diseases and or diseases caused by cellular senescence, as it provides an in vitro 
model ' of aged cells (in vitro cell cultures provide the advantage of being easily produced and are not as 
expensive as animal models). Thus these cells are potentially useful in drug candidate screening 
applications; a preferred application involves its use in high-throughput screening, e.g. to identified "lead 

20 compounds." 

A preferred embodiment of the invention relates to the use of the cDNA of SEQ ID NO:60 or part 
thereof, as a probe for examining individual aging at the gene expression level. Specifically, SEQ ID 
NO:60, or part thereof, can be used as a diagnostic reagent for various aging-associated diseases such as, 
but not limited to: arteriosclerosis, osteo-arthritis, dementia (including Alzheimer's disease) and 
25 Parkinson's disease. 

In a related embodiment, the cDNA of SEQ ID NO:60 or part thereof, could be used to synthesize 
antisense oligonucleotides by methods well known to those skilled in the art. Antisense oligonucleotides 
can be used to inhibit the synthesis of the protein of SEQ ID NO:301, thereby preventing cell and tissue 
aging and/or promoting the rejuvenation of aged cells and tissues. These antisense oligonucleotides can 
30 also be used for in vivo or ex vivo treatment and prophylaxis of diseases caused by cellular senescence or 
aging-associated diseases such as, but not limited to: arteriosclerosis, osteo-arthritis, dementia (including 
Alzheimer's disease) and Parkinson's disease. 

In a most preferred embodiment, SEQ ID NO:301, SEQ ID NO:60, or any part thereof, can be 
used as a pharmaceutical drug to treat pathologies such as, but not limited to cancers, inflammation, or 
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infections. For example, when used as an antibacterial, antiviral and/or antifungal agent, inhibition of 
microbial proliferation could be achieved by either directly inhibiting microorganism growth (in the case 
of fungal and bacterial infections) or DNA synthesis of infected cells in the case of viral infections. The 
DNA of SEQ ID NO:60 or part thereof, can also be used to develop gene therapy products in the in vivo 
5 or ex vivo treatment of diseases and conditions such as cancers and inflammation. SEQ ID NO:301, SEQ 
ID NO:60, or any part thereof, may also be used: 1) as a probe for the diagnosis, 2) as a prophylaxis or 3) 
as a treatment of aging-associated diseases such as, but not limited to: arteriosclerosis, osteo-arthritis, 
dementia (including Alzheimer's disease) and Parkinson's disease. In a related embodiment, the protein 
of SEQ ID NO:301 or the cDNA of SEQ ID NO:60, or any part thereof, can be used in the development 

10 of drugs that will be used in the prophylaxis or treatment of the diseases stated above (e.g. diseases caused 
by cellular senescence, aging-associated diseases and diseases caused by cellular proliferation). 

For the treatment or prevention of diseases and conditions associated with undesired 
proliferation, such as cancer, inflammation, or infection, the expression or activity of the present protein 
can be increased using any of a number of methods. For example, polynucleotides encoding the protein 

15 can be introduced into the undesired cells, wherein the protein is then expressed and inhibits the further 
growth of the cells. In one such embodiment, the polynucleotides can be incorporated into liposomes 
comprising on their surface a specific molecule that directs the targeting of the liposome to a specific cell 
type (e.g. a tumor-specific antibody). Alternatively, the protein of SEQ ID NO:301 can itself be 
administered to the cells, e.g. as a fusion protein also comprising a specific targeting polypeptide moiety. 

20 Further, a compound that enhances the expression or activity of the protein can be administered to cells, 
preferably in a way that specifically targets the compound to undesired cells, e.g. chemically linked to a 
heterologous specific targeting molecule. 

Protein of SEQ ID NO : 412 (internal designation 187-5-3-0-C7-CS) 

25 The protein of SEQ ID NO : 412 encoded by the cDNA of SEQ ID NO : 171 is homologous to 

the human CDK4-binding protein p34 SEM (sptrembl accession number Q9UHV2). p34 SEM is a new 
CDK4 regulator that prevents pl6INK4a from inhibiting the formation of cyclinDl-CDK4 complexes. 
p34 SEI " seems to act as a growth factor sensor and may facilitate the formation and activation of cyclin D- 
CDK complexes in the face of inhibitory levels of INK4 proteins (Sugimoto et al., Genes Dev. 13:3027- 

30 3033 (1999)). 

Progression through the cell cycle is a complex process that is regulated at many levels by several 
proteins. The activity of cyclin dependent kinases (CDK4 and CDK6) is regulated by the association of 
cyclin partner that acts as a positive effector and by two families of cdk inhibitors proteins (KIP) and the 
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inhibitors of cdk4 (INK4) such as pl6INK4a, which act as negative effectors (Sandhu et al., Cancer 
Detect. Prev.24: 107-1 18 (2000)). 

Cancer is a disease characterized by loss of cellular growth control, the molecular machinery of 
the cell cycle is involved in tumorigenesis. Many human tumors have been shown to have abnormality in 
5 this pathway resulting in either the functional inactivation of pl6INK4a or the excessive activity of CDK4 
(Palmero at al., Cancer Surv.27:35 1-357 (1996)). 

It is believed that the protein SEQ ID No: 412 plays a role in the cell cycle regulation via the 
binding to a cyclin dependent kinase. Other preferred polypeptides of the invention are fragments of SEQ 
ID NO: 412 having any of the biological activity described herein. The binding activity of the protein of 
10 the invention or part thereof to a cyclin dependent kinase, as well as its role in cell cycle, may be assayed 
using any of the assays known to those skilled in the art including those described in Sugimoto et al., 
supra. 

An embodiment of the present invention relates to methods of using the protein of the invention 
or part thereof to identify and/or quantify cyclin dependent kinases, preferably CDK4, in a biological 

15 sample, and thus used in assays and diagnostic kits for the quantification of such CDKs in bodily fluids, 
in tissue samples, and in mammalian cell cultures. The binding activity of the protein of the invention or 
part thereof may be assessed using the assay described in Sugimoto et al., supra or any other method 
familiar to those skilled in the art. Preferably, a defined quantity of the protein of the invention or part 
thereof is added to the sample under conditions allowing the formation of a complex between the protein 

20 of the invention or part thereof and the cyclin dependent kinase to be identified and/or quantified. Then, 
the presence of the complex and/or or the free protein of the invention or part thereof is assayed and 
eventually compared to a control using any of the techniques known by those skilled in the art. 

In another embodiment, the invention relates to compositions and methods using the protein of 
the invention or part thereof to stimulate cell proliferation both in vitro and in vivo. For example, soluble 

25 forms of the protein of the invention or part thereof may be added to cell culture medium in an amount 
effective to stimulate cell proliferation. 

The invention further relates to methods and compositions using the protein of the invention or 
part thereof to diagnose, prevent and/or treat several disorders associated with cell proliferation including 
but are not limited to, adenocarcinoma, sarcoma, lymphoma, leukemia, melanoma, myeloma, 

30 teratocarcinoma, cancers of the adrenal gland, bladder, bone, brain, breast, gastrointestinal tract, heart, 
kidney, liver, lung, ovary, pancreas, paraganglia, parathyroid, prostate, salivary gland, skin, spleen, testis, 
thyroid, uterus, and neurodegenerative disorders such as Alzheimer's disease (McShea et al., 
Am.J.Pathol. 150(6): 1933-1 939 (1997)). For diagnostic purposes, quantification of the protein of the 
invention could be investigated, using Northern blotting, RT-PCR, immunoblotting and any of protocols 
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known in the art, in biological samples and compared to the expression in control biological samples. 
Thus a diagnosis assay may be used, to determine altered expression of the protein of the invention, to 
correlate with diseases states and to evaluate the prognostic significance in diseases. For prevention 
and/or treatment purposes, inhibition of the endogenous expression of the protein of the invention using 
5 any of the antisense or triple helix methods described herein may be used. Alternatively, inhibitors for the 
protein's activity may be developed and use to inhibit and/or to reduce the protein's activity using any 
methods known to those skilled in the art. Antibodies which specifically bind to the protein of the 
invention may be generated using methods that are well known in the art and used as an antagonist. 

Protein of SEP ID NO : 299 (internal designation 184-1-4-0-C1 1-CS) 

10 The protein of SEQ ID NO : 299 encoded by the cDNA of SEQ ID NO: 58 and found in fetal 

liver and liver, is orthologous to the BolA protein. The BolA family comprises the morpho-protein BolA 
from £. coli and its various homologs. The expression of BolA is growth rate regulated and is induced 
during the transition into the stationary phase. BolA is also induced by stress during early stages of 
growth and can have a general role in stress response. It has also been suggested that BolA can induce 

15 the transcription of penicillin binding protein 6 and 5 (EMBO J. 1;8 (2) :3923-31 (1989)). 

E. coli cells become thinner and shorter after a period of starvation or stationary-phase conditions; 
this altered morphology is an adaptative response of E. coli to general forms of stress. The bolA gene 
seems to be involved in the switching between cell elongation and septation systems during the cell 
division cycle [J Bacteriol 170 :5 169-5 176 (1988)]. The regulation of bolA has been linked to the 

20 presence of gearbox promoter from which RNA is transcribed [Mol Microbiol 5 :2085-2091 (1991)]. 

Expression of bolA is governed by two promoters. P2 is located further upstream from the 
structural gene, is under the control of o d and transcribes bolA constitutively. The promoter PI, proximal 
to the structural gene, is a gearbox promoter under the control of o s from which bolA has been shown to 
be transcribed in an inverse growth rate-dependent fashion [J Bacteriol 173 :4474-4481 (1991)]. 

25 The alternate sigma factor g s is encoded by the gene rpoS and has been described as a central 

regulator for the induction of a set of specific genes involved in adaptation to stationary phase. It has, 
nevertheless, been shown that o s function is not confined to stationary phase. Significant increases in o s 
cellular levels were seen during exponential growth in response to forms of stress; genes under its control 
code for important adaptive regulators for general stress conditions [FEMS Microbiol Lett 30 :4 19-430 

30 (1997)]. 

The smaller morphology caused by stress-induced overexpression of bolA reduces the surface 
area exposed to the environment and decreases the cell's surface-to-volume ratio. 
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Identification of ortholog genes provides important information regarding functional and 
structural conservation within these orthologs throughout evolution. The concept of comparative gene 
identification has been previously used by many laboratories to search for orthologous genes once a 
particular gene of interest has been identified in another species [Genome Res 10 (5) : 703-13 (2000)]. 
5 The protein of invention contains a signal peptide corresponding to a short helix as predicted by 

software TopPred II [Clarosand and von Heijne, CABIOS applic. Notes, 10: 685-686 (1994)]. Thus, one 
aspect of this invention provides materials and methods for the delivery of recombinant proteins to liver 
cells. The signal peptide, encoded by an appropriate polynucleotide, can be linked to another 
protein/polypeptide (also encoded by an appropriate polynucleotide). The recombinant gene, containing 

10 the signal peptide sequence, is expressed and the desired protein is delivered via the signal peptide. 
Methods of producing such gene fusions, the expression of such gene products, and their use are well 
known to the skilled artisan. 

In another embodiment, BolA, or biologically active fragments thereof, can be used to modulate 
the stress response to environmental changes such as cytotoxic agents, heat shock, irradiation, genotoxic 

1 5 stress or growth factors. 

In another aspect of the invention, SEQ ID NO: 299 is incorporated into a prokaryotic expression 
vector and transfected into prokaryotic cells unable to adapt to environmental stress or cells containing a 
BolA defect. The expression vector can, optionally, contain a promoter system such as that described 
supra (e.g., PI, P2, o d , c s , etc.) which typically controls bolA expression. The components necessary for 

20 transcription can be provided in one or more expression vectors. Prokaryotic cells thus transformed can 
be useful in bioremediation systems where environmental stress is commonly encountered. Thus, 
preferred prokaryotes for the practice of this aspect of the invention lack bolA, or contain a bolA defect, 
and are known to be useful for bioremediation. 

In another embodiment, the subject invention provides methods and compositions to selectively 

25 identify liver tissues. The protein encoded by SEQ ID NO: 299 can be used to synthesize specific 
polyclonal or monoclonal antibodies using any techniques known to those skilled in the art. These 
antibodies can be used to selectively identify liver tissue according to well-known histological 
immunoassays. The ability to immunologically identify tissue samples is industrially important for 
analysis of mismarked biopsy samples (e.g., laboratory errors) where the origin of the tissue sample is in 

30 question, or simply to verify that a tissue sample originated from liver. The antibodies can also be used to 
identify cancer metastases originating from the liver. 

Further, antibodies provided by the subject invention can also be used to assay animal feeds for 
the presence of liver or liver by-products. As is known, many animal feeds contain animal protein. The 
use of animal feeds containing animal protein has been associated with disorders in both animals and 
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humans (the most notorious of which is bovine spongiform encephalitis). This has resulted in the banning 
of animal protein in feeds provided to animals. However, to ensure compliance with such bans, animal 
feeds must be tested. Thus, the antibodies of the invention can be used to test animal feeds for the 
presence of liver according to methods known to those skilled in the art. 

5 Proteins of SEQ ID NOs: 249 and 288 (internal designation 105-037-2-0-H1 1-CS and 174-7-4-0-H1-CS 
respectively). 

The 403-amino-acid-long protein of SEQ ID NO: 249 encoded by the cDNA of SEQ ID NO: 8 is 
extensively homologous to the protein of SEQ ID NO: 288 encoded by the cDNA of SEQ ID NO: 47 with 
the exception of five amino acids in positions 192-194, and 298-299 which are not present in protein of 

10 SEQ ID NO: 288. It is likely that the two proteins are the result of an alternative splicing and display 
similar functions and utilities. 

The 403-amino-acid-long protein of SEQ ID NO: 249, overexpressed in salivary glands, exhibits 
extensive homology to the mus musculus hypothetical protein (Genbank accession number AB030196). 
The amino acid residues of protein of SEQ ID: 249 show a high degree of identity to the Genbank 

15 sequence. However, the protein of Genbank sequence does not have the twenty amino acids (192 to 194, 
and 298-303, 353-354, 380-381, and 387-393) and also displays 35 different amino acids from the SEQ 
ID NO: 249. In addition, four transmembrane domains are predicted for the protein of SEQ ID NO: 249 
from positions 31 to 51, 75 to 95, 154 to 174, and from 236 to 256 as predicted by the software TopPred 
II (Claros and von Heijne, CABIOS applic. Notes, 10 : 685-686 (1994)). 

20 When expressed in E. Coli, the matched sequence suppresses bacterial growth (Inoue et al, 

Biochem Biophys Res Commun 268:553-61 (2000)). It is therefore believed that the proteins of SEQ ID 
NO: 249 and 288 or a bacterial growth suppressing fragment thereof can be used to suppress bacterial 
growth by contacting bacteria (gram negative or gram positive) with the polypeptides of the invention. 
The growth inhibiting activity of the protein of the invention or part thereof may be assayed using any of 

25 the assays known to those skilled in the art including those described in Inoue et al, supra. 

In accordance with one aspect of the invention, methods and compositions using the protein of 
the invention or a fragment thereof to suppress bacterial growth are provided. In a preferred embodiment, 
the protein of the invention is expressed in a bacteria, preferably E. coli, using recombinant DNA 
technology methods known to those skilled in the art. The expressed protein can then be used to inhibit 

30 bacterial growth. The effects of the expressed protein and analogs or antagonists thereof can be assessed 
using any methods or techniques known to those skilled in the art. 

Further included in the invention are the polypeptides encoded by the human cDNA of clone 105- 
037-2 -0-H1 1-CS-SD. The polypeptides of SEQ ID NO: 249 may be interchanged with the corresponding 
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polypeptides encoded by the human cDNA of clone 105-037-2-0-H11-CS-SD. Further included in the 
invention are polynucleotides encoding said polypeptides. Preferred polynucleotides are those of SEQ ID 
NO: 8 and of the human cDNA of clone 105-037-2-0-H1 1-CS-SD. 

Nucleotide sequences encoding the polypeptides of SEQ. ID. NOs: 249 and 288 can be used to 
5 generate probes for the detection of related genes. Vectors expressing the nucleotide sequence can be used 
can be used to express the polypeptide in target cells. Antisense nucleotides can be used to inhibit the 
expression of the polypeptide. 

Thus, in another embodiment of the invention the protein or a fragment thereof can be used as a 
marker protein to selectively identify tissues, preferably salivary glands. For example, the protein of the 

10 invention or a fragment may be used to synthesize specific antibodies using any techniques known to 
those skilled in the art. Such tissue-specific antibodies may then be used to identify tissues of unknown 
origin, for example, forensic samples, differentiated tumor tissue that has metastasized to foreign bodily 
sites, or to differentiate different tissue types in a tissue cross-section using immunochemistry. In another 
embodiment, polynucleotides encoding the protein of SEQ. ED. NO. 249 can be used for in situ 

15 hybridization. 

The transcript coding for the protein gng31g (Genebank accession number AF069954) is 
transcribed from a bidirectional promoter divergently with the transcript coding for the gamma 3 subunit 
protein called gng3, a novel human G binding protein gamma-3 (HGPG) (Genbank accession number 
AF069953) and this organization is conserved across species within the human genome (Dowries et al, 

20 Genomics, 53:220-230 (1998)). 

Several genes which are linked in common physiological functions share a common divergently 
bidirectionnel promoter like aB crystallin and a crystallin, collagen type IV Al and A2, surf 1-3, and surf 
5 genes, dihydrofolate reductase and 2 mismatch repairl (Iwaki et al, Genomics, 45: 386-394 (1997), 
Burbelo et al, Proa Natl Acad. Sci. USA 85: 9679-9682 (1988), Kaytes et al, J. Biol Chem, 263: 

25 19274-19277 (1988), Poschl et al, EMBO J., 7:2687-2695 (1988), Soininen et al, J. Biol Chem, 263: 
17217-17220 (1988), Garson et al, Genomics, 30: 163-170 (1995), Williams et al, Mol Cell Biol, 6: 
4558-4569 (1986), Fujii et al, J. Biol Chem., 264: 10057-10064 (1989)). The heterodimeric G proteins, 
a family of GTPases are present in all cells and control a variety of functions (metabolic, humoral, neural 
and developmental) by transducing hormonal, neurotransmitter and sensory signals into an array of 

30 cellular responses. Triggered by cell surface receptors, each G protein regulate the activity of a specific 
effector including adenylate cyclase, phospholipase C, and ion channels protein which initiate appropriate 
biochemical responses. In view of this, it is believed that the transcript coding for the proteins of SEQ ID 
NO: 249 shares common regulatory elements with gng3 gene and that the products of such genes which 
are protein of SEQ ID NO: 249 and gng3 are physiologically coupled in unknown ways. Thus, in an 
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embodiment of the invention, the protein of SEQ ID NO: 249 or part thereof may be used to regulate 
signal transduction of hormonal, neurotransmitter, and sensory signals to provide an array of cellular 
responses. 

In yet another embodiment of the invention, the polypeptides of the present invention and the 
5 related polynucleotides may be used to treat several types of disorders including, but not limited to, 
cancer, neurodegenerative diseases, cardiovascular disorders, hypertension, renal injury and repair, septic 
shock. 

In one embodiment, the protein of SEQ. ID. NO: 249 or a fragment, derivative or analog thereof 
may be administered to a subject to treat or prevent a disorder associated with decreased expression or 
10 activity of the protein. 

In a further embodiment, a vector capable of expressing the protein of SEQ. ID. NO: 249 may be 
administered to a subject to treat or prevent a disorder associated with decreased expression or activity of 
the protein. Naked nucleotides encoding the protein of SEQ. ID. NO 249 may also be used. 

In yet another embodiment, a pharmaceutical composition comprising a substantially purified the 
15 protein of SEQ ID NO: 249 may be administered to a subject to treat or prevent a disorder associated 
with decreased expression of the protein. 

In still another embodiment, the polypeptide of SEQ ID NO: 249 can be used to develop and 
screen antagonists. For example, purified polypeptide can be used to develop antibodies or to screen 
libraries of pharmaceutical agents to identify those that inhibit the physiological functions of the protein. 
20 Thus, in a further embodiment, an antagonist of the protein of SEQ ID NO: 249 can be 

administered to a subject to prevent or treat a disease associated with increased expression or activity of 
the protein. Similarly, in another embodiment a vector expressing the complement of a polynucleotide 
sequence encoding the protein of SEQ ID NO: 249 may be administered to decrease the expression of the 
protein. 

25 The protein of SEQ ID NO: 249 displays a leucine zipper pattern situated near its NH2 terminal 

part (position 20 to 41). Thus, it is believed that the protein of SEQ ID NO: 249 is able to dimerize either 
with itself (homo-dimerisation) or with an heterologous protein (hetero-dimerisation) of interest, through 
the mediation of its leucine zipper domain. Preferred polypeptides of the invention are polypeptides 
comprising leucine zipper, domains fragments and fragments having any of the biological activities 

30 described herein. The multimerization activity of the protein of the invention or part thereof may be 
assayed using any of the assays known to those skilled in the art including circular dichroism spectrum 
and thermal melting analyses as described in US patent 5,942,433. The utilities of proteins containing 
leucine zipper domains, such as the protein of SEQ ED No: 249, are described elsewhere in the 
application. 
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Protein of SEP ID NO: 259 (internal designation 1 14-01 6-1 -0-H8-CS) 

The protein of SEQ ID NO: 259, herein referred to as HOPP, is encoded by clone 114-016-1-0- 
H8-CS (SEQ ID NO: 18). This protein is homologous to a protein of Arabidopsis thaliana (ASY1) and 
Saccharomyces cerevisiae, (HOP1) (Caryl A.P. et al. Chromosoma, 109, 62-71; Hollingsworth N.M. et 
5 al. Cell, 61, 73-84). 

In addition, the 394-amino-acid protein of SEQ ID NO: 259 displays a pfam HORMA domain 
from position 22 to 230. The HORMA domain is a common structural element in mitotic checkpoints, 
chromosome synapsis and DNA repair. For example, the HORMA domain was found in: (1) MAD2, a 
key component of the mitotic-spindle-assembly checkpoint (reviewed in Straiht AF. Current Biology 

10 1997, 7:613-616); (2) HOP1, a conserved protein that is involved in meiotic-synaptonemal-complex 
assembly (Hollingsworth N.M. et al. Cell, 61, 73-84); and (3) in Rev7p, a subunit of the yeast DNA 
polymerase "epsilon" that is involved in translation, template independent DNA synthesis (Aravind L. 
and Koonin E.V., Trends Biochem Sci. 1998 Aug;23(8):284-6). 

The pairing of homologous chromosomes during meiotic prophase culminates in the formation of 

15 the synaptonemal complex (SC), which is a ribbon-like, proteinaceous structure that holds homologous 
chromosomes in close apposition along their entire length. The synaptonemal complex (SC) is a 
prominent and evolutionally well conserved structure which is strictly meiotic. Evidence from mutant 
phenotypes supports the hypothesis that recombination and SC formation are mutually interdependent 
processes. First, although not required for homology recognition, the SC could promote interhomolog 

20 interactions in situations where the normal processes have failed (e.g., interlocking, heterologous pairing, 
etc.). Second, polymerization of the SC components might permit the recombination process to progress 
by modulating the number and localization of reciprocal versus exchanges (i.e. interference). Third, the 
SC may play an important role in meiotic chromosome structure and especially inter-sister interactions. 

Synapsis of homologous chromosomes is a key event in meiosis as it is essential for normal 

25 chromosome segregation and is implicated in the regulation of crossover frequency (for review see 
Zickler D., J Soc Biol 1 999; 1 93( 1 ) : 1 7-22). Mutants in HOP1 and ASY1, both proteins having significant 
homology to the protein of SEQ ID NO: 259, display decreased levels of meiotic crossover and intragenic 
recombination between markers on homologous chromosomes (Hollingsworth KM., Byers B., Genetics 
1989 Mar;121(3):445-62 ; Caryl AP et al. Chromosoma 2000;109(l-2):62-71). 

30 Thus, the invention relates to methods and compositions using the protein encoded by clone 1 14- 

016-1-0-H8-CS or polynucleotide of SEQ ID NO: 18, or biologically active fragments thereof, to restore, 
normal chromosome segregation in cells by administration of compositions comprising HOPP 
polypeptide, or polynucleotide encoding a HOPP polypeptide, encoded by clone 1 14-016-1-0-H8-CS, or 
polynucleotide in therapeutically effective amounts. The loss of normal chromosome segregation in 
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normal cells leads to aberrant chromosome segregation events, a hallmark of tumor progression. HOPP 
proteins, encoded by clone 1 14-0 16-1 -0-H8-CS, can be targeted to the nucleus by nuclear targeting 
sequences according to well-known methods. Nuclear targeting sequences (or NLS) can be chemically or 
recombinantly attached to HOPP. Alternatively, the HOPP gene can be used in known gene therapy 
5 protocols to restore normal chromosome function. 

Infertility due to gametogenic failure is frequently associated with structural autosomal 
abnormalities. Recent meiotic studies, at pachytene stage, have shown a failure around the breakpoints, an 
association of the translocation figure with the sex chromosomes, and the frequent involvement of the 
acrocentric chromosomes. Two main models are proposed to explain the male sterilizing effect of 

10 rearrangements. The impairment of spermatogenesis could be the result of: 1) the XY-autosome 
interaction; or 2) the disruption around the breakpoints at the pachytene stage. These defects may 
contribute significantly to germ-cell atresia (for review see Luciani JM, Guichaoua MR Reprod Nutr Dev 
1990; Suppl l:95s-103s and Miklos GL. Cytogenet.Cell Genet. 1974;13(6):558-77). Thus the subject 
invention also relates to methods and compositions of using the protein of SEQ ID NO: HOPP or clone 

15 1 14-016-1-0-H8-CS, or biologically active fragments thereof, to reduce the incidence of infertility due to 
gametogenic failure. HOPP can be introduced into sperm as described in the preceding paragraphs. 
HOPP, optionally joined to a NLS sequence, can also be introduced into sperm or eggs by other methods 
well known in the art (such as electroporation or microinjection). 

The protein of SEQ ID NO: 259, encoded by clone 114-016-1-0-H8-CS, can also arrest cell 

20 division in human cells if the mitotic spindle apparatus is improperly attached to the chromosomes 
(Allshire R.C. Current Opinion in Genetics and Development 1997, 7:264-273). In the absence of 
functional protein of SEQ ID NO: 259, cells exposed to drugs which inhibit the formation of a mitotic 
spindle, such as benomyl, vinblastine, nocodozole, etc. would be expected to undergo rapid cell death due 
to massive chromosome loss. Human cells containing HOPP would be expected to survive such drug 

25 treatment because they are able to stop dividing prior to the chromosome loss event. Tumor cells that are 
hypersensitive to chemotherapeutic agents, which inhibit the formation of the mitotic spindle, may be 
sensitive to these drugs because they are defective in the checkpoint protein. Thus, screening assays for 
the presence or absence of the protein in a given tumor would provide an indication of the 
chemosensitivity of a particular tumor. The present invention therefore includes methods of determining 

30 prognostic benefit of treating a patent with a chemotherapeutic agent or determining which 
chemotherapeutic agent from a group of at least two would a patient more likely benefit from. 
Furthermore, the loss of checkpoint function in a normal cell may predispose that cell to aberrant 
chromosome segregation events, a hallmark of tumor progression. Thus the antibodies, polypeptides and 
polynucleotides of the present invention would be useful in diagnosing particular cancers. 
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Polyclonal antibodies can be produced by injecting a host animal such as rabbit, rat, goat, mouse 
or other animal with an immunogen of this invention. The sera is extracted from the host animal and is 
screened to obtain polyclonal antibodies which are specific to the immunogen. Methods of screening for 
polyclonal antibodies are well known to those of ordinary skill in the art such as those disclosed in 
5 Harlow & Lane, Antibodies: A Laboratory Manual, (Cold Spring Harbor Laboratories, Cold Spring 
Harbor, N.Y.: 1988) the contents of which are hereby incorporated by reference. 

The monoclonal antibodies can be produced by immunizing, for example, mice with an 
immunogen according to the invention. Methods of producing monoclonal antibodies are well-known in 
the art and include those methods Kohler, B. and Milstein, C, Nature (1975) 256: 495-497. Hybridomas 

10 can be expanded, if desired, and supernatants can be assayed by conventional immunoassay procedures, 
for example radioimmunoassay. Positive clones can be further characterized. Hybridomas that produce 
the desired antibodies can be grown in vitro or in vivo using known procedures. The monoclonal 
antibodies can be isolated by conventional immunoglobulin purification procedures such as ammonium 
sulfate precipitation, gel electrophoresis, dialysis, affinity chromatography, and ultrafiltration. 

15 Antibodies of the invention can be labeled with a detectable moiety. As noted above, a 

"detectable moiety" is well known to those of ordinary skill in the art and include, but are not limited to, a 
fluorescent label, a radioactive atom, a paramagnetic ion, biotin, a chemiluminescent label or a label 
which can be detected through a secondary enzymatic or binding step. 

The invention further provides a method of determining the susceptibility of a tumor sample to 

20 treatment with a mitotic spindle inhibitor which comprises steps of: a) contacting the tumor sample with 
an antibody, wherein the antibody is labeled with a detectable moiety and is capable of specifically 
binding to the protein of invention, or a fragment thereof, and b) assaying for the presence of an 
immunocomplex formed in step (a). The absence of the immunocomplex indicates that the tumor would 
be susceptible to treatment with a mitotic spindle inhibitor such as benomyl, vinblastine, and nocodozole. 

25 The subject invention also provides a pharmaceutical composition comprising nucleic acid 

encoding the protein of SEQ ID NO: 259 and a carrier. In one aspect of this invention, the compositions 
are capable of passing through a cell membrane and provide for the expression of the protein of invention. 
As used herein, the term "carrier" includes pharmaceutically acceptable carriers and encompasses any of 
the standard pharmaceutically accepted carriers, such as phosphate buffered saline solution, water, 

30 emulsions such as an oil/water emulsion or a triglyceride emulsion, various types of wetting agents, 
tablets, coated tablets and capsules, carriers contain excipients such as starch, milk, sugar, certain types of 
clay, gelatin, stensic acid, talc, vegetable fats or oils, gums, glycols, or other known excipients. Flavor and 
color additives or other ingredients can also be included. In addition to the standard characteristics of the 
pharmaceutically acceptable carriers, the "suitable" carriers of the subject can also include those carriers 
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which are able to penetrate the cell membrane. Therefore in one embodiment of the pharmaceutical 
composition the pharmaceutically acceptable carrier binds to a receptor on a cell capable of being taken 
up by the cell after binding to the structure. 

This invention further provides a method of suppressing tumor formation in a subject which 
5 comprises administering a nucleic acid encoding the protein of invention in an amount effective to 
enhance expression of this protein. 

Proteins of SEP ID NOs: 311 and 312 (internal designation 188-28-4-0-B12-CS.corr and 188-28-4-0- 
B12-CS.fr respectively) 

The 466-amino-acid-long protein of SEQ ID NO: 311, encoded by the human cDNA of clone 

10 188-28-4-0-B12-CS or the cDNA nucleotide sequence of SEQ ID NO: 70, is related to proliferating-cell 
nucleolar antigen pi 20 (Genbank accession number M32110) encoded by noli; and the yeast nucleolar 
protein Nop2p coded by nop2 (Genbank accession number U12141). SEQ ID NO: 3 1 1 (encoded by clone 
188-28-4-0-B12-CS.corr) shows strong homology with three proteins described as homologs of pi 20 
(Genbank accession number AK002229 and Geneseqp accession numbers: Y86441, Y86442). In 

15 addition, the protein of SEQ ID NO: 311 is a polymorphic variant of SEQ ID No: 312 encoded by the 
cDNA of SEQ ID No: 71. 

In addition, the protein of the invention exhibits the pfam NOLl/NOP2/sun family signature from 
positions 201 to 276. This motif is also found by emotif from positions 230 to 245. The NOLl/NOP2/sun 
family include pl20 and Nop2p. These proteins are involved in nucleolar structure and activity as well as 

20 the regulation of cell cycle. 

Freeman J.W. et al. (Cancer Res. 48: 1244-51, 1988) identified pl20, a 120-kD nucleolar antigen 
associated with proliferating cells. This protein is a proliferation-associated antigen that is temporally 
regulated during the cell cycle and demonstrates a dramatic increase in expression at the Gl-S boundary. 
This suggests that pi 20 can play a role in the regulation of the cell cycle and the increased nucleolar 

25 activity that is associated with cell proliferation (Fonagy A. et al. (1993) J. Cell. Physiol. 154: 16-27). 

The human pi 20 protein is also the most cancer specific of the identified proliferation-associated 
nucleolar proteins. Antigen pi 20 was detectable in a broad range of human malignant tumors but not in 
benign tumors or corresponding normal tissues. The antigen was not detectable in growth-arrested cells 
but was expressed early in Gl of the cell cycle. 

30 Overexpression of human pi 20 leads to the transformation of NIH 3T3 cells. Expression of 

antisense pi 20 constructs causes the pl20-transformed cells to revert to their original phenotype. Perlaky 
L. et al. [Cancer Res. 52: 428-36 (1992)] and Valdez et al. [Cancer Res. 52: 5681-87 (1992)] reported that 
the middle region of antisense pi 20 RNA inhibited proliferation of NIH 3T3 cells to approximately the 
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same extent as the full-length antisense construct. The predicted mouse and human PI 20 proteins are 
63% identical. 

Another protein of the Noll/Nop2/Sun family, Nop2p, coded by the gene NOP2, has a role in 
nucleolar function during the onset of growth and in the maintenance of nucleolar structure (de Beus et al. 
5 (1994) J. Cell Biol. 127:1799-813). The two proteins, pl20 and Nop2p, are associated to ribosomal RNA 
in pre-ribosomal particles and can mediate the maturation process of the ribosome (Hong B. et al. (1997) 
Mol. Cell Biol. 17:378-88; Gustafson W.C. et al. (1998) Biochem. J. 331:387-93). 

The subject invention provides the polypeptides encoded by the human cDNA of clone 188-28-4- 
0-B12-CS and polynucleotide sequences encoding the same amino acid sequences. Also included in the 

10 invention are biologically active fragments of the protein encoded by the human cDNA of clone 188-28- 
4-0-B12-CS and polynucleotide sequences encoding these biologically active fragments. "Biologically 
active fragments" are defined as those peptide or polypeptide fragments having at least one of the 
biological functions of the full length protein (e.g., the ability to transform cell lines in vitro.). 

The invention also provides variants of the protein of SEQ ID NO: 311, encoded by clone 188- 

15 28-4-0-B12-CS. These variants have at least about 80%, more preferably at least about 90%, and most 
preferably at least about 95% amino acid sequence identity to the amino acid sequence encoded by clone 
188-28-4-0-B12-CS. Variants according to the subject invention also have at least one functional or 
structural characteristic of the protein encoded by clone 188-28-4-0-B12-CS. The invention also provides 
biologically active fragments of the variant proteins. Unless otherwise indicated, the methods disclosed 

20 herein can be practiced utilizing the protein encoded by clone 188-28-4-0-B12-CS, or clone 188-28-4-0- 
B12-CS, or variants thereof. Likewise, the methods of the subject invention can be practiced using 
biologically active fragments of the protein encoded by clone 188-28-4-0-B12-CS , clone 188-28-4-0- 
B12-CS, or variants of said biologically active fragments. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 

25 the amino acid sequence provided by clone 188-28-4-0-B12-CS. It is well within the skill of a person 
trained in the art to create these alternative DNA sequences encoding proteins having the same, or 
essentially the same, amino acid sequence. These variant DNA sequences are, thus, within the scope of 
the subject invention. As used herein, reference to "essentially the same" sequence refers to sequences 
that have amino acid substitutions, deletions, additions, or insertions that do not materially affect 

30 biological activity. Fragments retaining one or more characteristic biological activity of the protein 
encoded by clone 188-28-4-0-B12-CS are also included in this definition. 

"Recombinant nucleotide variants" are alternate polynucleotides which encode a particular 
protein. They can be synthesized, for example, by making use of the "redundancy" in the genetic code. 
Various codon substitutions, such as the silent changes which produce specific restriction sites or codon 
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usage-specific mutations, can be introduced to optimize cloning into a plasmid or viral vector or 
expression in a particular prokaryotic or eukaryotic host system, respectively. 

In one aspect of the subject invention, SEQ ID NO: 311, encoded by clone 188-28-4-0-B12-CS, 
and variants thereof, can be used to generate polyclonal or monoclonal antibodies. Both biologically 
5 active and immunogenic fragments of the amino acid sequence or variant proteins can be used to produce 
antibodies. Polyclonal and/or monoclonal antibodies can be made according to methods well known to 
the skilled artisan. Antibodies produced in accordance with subject invention can be used in a variety of 
detection assays known to those skilled in the art. Another aspect of this invention provides monoclonal 
and polyclonal antibodies which do not cross-react with known pi 20 proteins. 
10 In one embodiment, the protein encoded by clone 188-28-4-0-B12-CS, variants of said protein, 

and biologically active fragments of the protein or said variants can be used as a nucleolar-fraction marker 
in nuclear fractionation studies or as a marker of pre-ribosomal particles, in methods well known to the 
skilled artisan. 

In another embodiment, the protein encoded by clone 188-28-4-0-B12-CS, variants of said 

15 proteins, and biologically active fragments thereof, can be used as a proliferation marker in neoplastic 
cells. Alternatively, quantitative immunoassays can be used to assess the levels of the protein in resected 
cancerous and normal tissues. Alternatively, levels of the protein encoded by clone 188-28-4-0-B12-CS 
can be compared between an individual and a "normal" control group as a prognostic indicator of 
malignancy. Further, the relationship between protein expression and cell proliferation can be assayed 

20 using cancer cell lines. Thus, the protein of the invention or part thereof can be used as a marker for 
proliferation in human cancer cells in vivo and in vitro. If the absence of expression of the protein of the 
invention on normal and benign tumors is confirmed, it could serve as a marker of malignant cancer cells. 
The proliferation rate of cancer cells can be also determined by quantitative analysis of the expression of 
the protein encoded by 188-28-4-0-B12-CS, or biologically active fragments thereof, according to 

25 methods described in Trere D. et al. (J. Pathol. 192:216-20, 2000). 

The transforming activity of the protein of the invention can be assayed as described in Perlaky et 
al. (Anticancer Drug Des. 8:3-14, 1993). Thus, polynucleotides encoding the (188-28-4-0-B12-CS) 
protein can be used to induce transformation on NIH/3T3 cells in vitro. Alternatively, the polynucleotide 
encoding (188-28-4-0-B12-CS) can be used to provide antisense oligonucleotides useful in antisense 

30 therapeutic protocols according to methods known in the art. 

Protein of SEP ID NO: 406 (internal designation 174-32-4-0-F8-CS) 

The 378-amino-acid-long protein of SEQ ID NO: 406 encoded by the cDNA of SEQ ID NO: 165 
is expressed in tissues such as colon, prostate and salivary glands and overexpressed in colon and 
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prostate. The C-terminus of the protein of the invention is homologous to the human retinoblastoma- 
binding protein, RbAp48 (Qian YW et al. (1993) Nature 364:648-652, GenBank accession number: 
X74262) and to its homologues conserved in other organisms including mouse (GenBank accession 
number: Q60972) and C. elegans (GenBank accession number: AF1 16530). The protein of the invention 
5 contains also two internal WD-repeat clusters (Prosite PS00678, amino acid positions 267 to 304 and 
positions 333 to 370, respectively), a structural motif involved in proteins interaction in signal 
transduction pathway and transcription regulation (Neer EJ et al. (1994) Nature 371:297-300; Neer EJ et 
al (1996) Cell 84:175-178). 

The retinoblastoma protein (Rb) is the product of the retinoblastoma gene. Deletion or 

10 inactivation of both Rb alleles is essential in the formation of human retinoblastoma in both hereditary 
and sporadic forms (Benedict WF et al. (1983) Science 219: 973-975). 

Loss-of-function mutations in the Rb gene is also found in many other tumor types, including 
osteosarcoma, breast carcinoma, small cell lung carcinoma, bladder carcinoma, prostate carcinoma and 
soft tissue sarcoma (Bookstein R et al. (1991) Crit. Rev. Oncog. 2:211-227). Introduction of the wild- 

15 type Rb gene into cultured retinoblastoma cells suppresses cells growth and their tumorigenicity in nude 
mice (Huang HJ et al (1988) 242:1536-1566). Expression of normal Rb protein in prostate carcinoma, 
osteosarcoma, breast carcinoma and bladder carcinoma cells also suppresses their neoplastic phenotype, 
thus establishing the Rb gene as a tumor suppressor (Reviewed by Weinberg RA (1991) Science 
254:1138-1146). 

20 It has been shown that the Rb gene product is a nuclear phosphoprotein that undergoes cyclic 

phosphorylation and dephosphorylation during cell cycling. Rb is unphosphorylated or 
"underphosphorylated" during early Gl phase, and become phosphorylated just before S phase, and 
remains phosphorylated until late mitosis. Injection of unphosphorylated Rb protein into cell during early 
Gl phase inhibits the entry into S phase, suggesting that some of the growth suppressor functions of Rb 

25 may be carried out by the underphosphorylated form of Rb (Goodrich et al. (1991) Cell 67:293-302; 
Hinds PW et al. (1992) Cell 70:993-1006). Rb protein not only regulates cell cycle, but is also involved 
in cell differentiation. For example, lens epithelial cells in Rb-deficient mouse fail to terminally 
differentiate and undergo apoptosis (Morgenbesser et al.(1994) Nature 371:72-74). 

It has been demonstrated that Rb protein inhibits cellular growth and proliferation through 

30 interactions with multiple cellular proteins that interfere with these cellular protein's downstream actions. 
For example, Rb protein is able to form specific complexes with transcriptional factor E2F, which regulates 
the expression of a set of genes essential for the Gl to S phase transition (Nevin JR et al (1992) Science 
258:424-429). The Rb protein restrains cell cycle progression by masking the E2F transactivation domain 
and by blocking the interaction of surrounding enhancer elements and basal transcription complex 
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(Weintraub SJ et al (1995) Nature 375:812-815). Association of Rb and UBF, a ribosomal transcription 
factor, results in suppression of the synthesis of ribosomal RNA by RNA polymerase I (Cavanaugh LI et al 
(1995) Nature 374:177-180; Mancini M et al (1994) Proc.Natl.Acad.Sci.USA 91:418-422). 

RbAp48 was first identified as a major protein from Hela cell that binds to a putative functional 
5 domain at the C-terminus of the Rb protein. Only unphosphorylated and hypophosphorylated forms of the 
Rb protein were coprecipitated with RbAp48. Like Rb protein, RbAp48 is a ubiquitously expressed nuclear 
protein that shares sequence homology with MSI1, a negative regulator of the Ras-cAMP pathway in the 
yeast Saccharomyces cerevisiae. Overexpression of RbAp48 can convert the yeast mutant strains from heat- 
shock sensitivity to heat-shock resistant, similar to the result obtained from MSI1 overexpression. Thus, the 

10 human RbAp48 is a functional homologue of MSI1 (Qian YW et al. (1993) Nature 364:648-652). 

Rbap48 protein was later found to be the p48 subunit of mammalian chromatin assembly factor 1 
(CAF-1) and to be present in histone deacetylase complex (Parthum MR et al (1996) Cell 87:85-94). CAF-1 
from human cell nuclei consists of three subunits of pi 50, p60 and p48 and is involved in assembling of 
histone3 and histone4 onto nascent nucleosome structure during DNA replication in S phase (Kaufman FD et 

15 al (1995) Cell 81:1 105-1 1 14). Indeed, some transcriptional repressors function through the recruitment of the 
histone deacetylase complex (HDAC), the latter acts by acetylating or deacetylating the tail protruding from 
the core histones, thereby modulating the local structure of chromatin (Reviewed by Pazin MJ et al (1997) 
Cell 89:325-328). Rb protein recruits HDAC for binding to E2F to repress transcription (Brehm A et al 
(1998) Nature 391:597-601; Magnaghi-Jaulin L et al (1998) Nature 391:601-604). It was also reported that 

20 the p48 subunit of chicken CAF-1 can bind to chicken HDAC in vitro through interaction of WD-40 repeats 
presented in both protein sequences (Ahmad A et al (1999) J.Biol.Chem. 274:16646-16653). 

The WD-40 protein family is characterized by the repetition of a loosely conserved repeat of 
approximately 40 amino acids, each repeat being separated from each other by a Trp-Asp dipeptide sequence. 
The conserved core of this repeat, which usually ends with the amino acids Trp-Asp (WD), was first 

25 identified in the beta-subunit of the heterotrimeric GTP-binding protein, G-protein (Fong H et al. (1986) 
Proc.Natl.Acad.Sci.USA 83:2162-2166). Among the WD-40 proteins identified to date, none are enzymes, 
and all seem to have regulatory functions (Neer, E. J. et al. (1994) Nature 371:297-300). A number of WD 
repeat proteins have been localized to the nucleus and function in the repression of transcription. These 
include Tupl, Hirl, and Met30 in S. cerevisiae; SCON2 in Neurospora crassa; extra sex combs and Groucho 

30 in Drosophila; COP1 in Arabidopsis thaliana; and HIRA and the family of TLE proteins in humans. These 
WD-40 proteins turn off a wide variety of genes, including those involved in segmentation, sex 
determination, and neurogenesis (controlled by Groucho) and those involved in photomorphogenesis 
(controlled by COP1). All of these WD^O containing proteins have been proposed to fold into propellers in 
which the internal beta-strands form a rigid skeleton that is fleshed out on the surface by specialized loops to 
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which other proteins bind (Lambright DG et al (1996) Nature 379:31 1-319; Sondex J et al (1996) Nature 
379:369-374). 

Thus, discovery of new Rb-binding proteins is necessary to design methods of regulating cell growth 
and block tumorigenesis through the control of tumor suppressor proteins in their interaction with oncogene 
5 products and may provide new compositions which are useful in the diagnosis, prevention and treatment of 
cancer and developmental disorders. 

It is believed that the protein of SEQ ID NO: 406 or part thereof plays a role in the control of gene 
expression, probably as a transcription repressor. The protein of the invention is thought to be able to bind to 
other proteins, preferably to nuclear proteins, more preferably to Rb. Preferred polypeptides of the invention 

10 are polypeptides comprising fragments of SEQ ID NO: 406 from position 159-373, 267-304 and 333-370. 
Other preferred polypeptides of the invention are polypeptides comprising fragments of SEQ ID NO: 406 
having any of the biological activity described herein. The ability of the protein of the invention or part 
thereof to function as a transcription repressor may be assessed using techniques well known to those skilled 
in the art including those described previously (Weintraub SJ (1995) Nature 375:812-815; Qian YW (1995) 

15 J.Biol.Chem. 270:25507-25513). The ability of the protein of the invention or part thereof, especially 
fragments containing WD-repeats, to bind to other proteins may be assessed using techniques well known to 
those skilled in the art including those described herein. For example, the protein of the invention could be 
used as a "bait" protein in a yeast double hybridization system (e.g. Gal-4-based system from Clontech) to 
isolate and eventually to identify its interacting protein partner in vivo from a cDNA library. Alternatively, 

20 the protein of the invention or part thereof can be used either in a pure form or in a fusion form (linked to a 
reporter gene product, such as alkaline phosphatase) to screen a phage cDNA expression library derived from 
selected tissues or cell types of a given organism (Scott et al (1990) Science 249:386-390; Lam et al (1992) 
Nature 354:82-84). Preferably, the binding ability of protein of the invention is tested in mammalian cell 
transfection experiments. When fused in-frame to a suitable peptide tag in expression vector, such as [His] 6 in 

25 the pRset expression plasmid vector (Invitrogen) and introduced into culture cells, the proteins that bind to 
the expressed fusion protein can be immunoprecipitated using anti-[His] 6 antibody. This approach can also 
be employed to confirm the findings obtained from either yeast double hybridization system or in vitro phage 
peptide library screening. In this case, the putative interacting partner protein will be fused to a distinct tag in 
a second expression vector and co-transfected into culture cells. True binding complex will be co- 

30 immunoprecipitated with the two different anti-tag antibodies. In a particular embodiment, an affinity 
chromatography method is carried out to identify the interacting protein partners in vitro from cell lysates as 
performed for the identification of the RbAp48 protein (Qian YW et al. (1993) Nature 364:648-652). 

An embodiment of the present invention relates to methods of using the protein of the invention 
or part thereof, particularly polypeptides containing WD-motifs, or derivative thereof to identify and/or 
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quantify binding proteins, preferably nuclear proteins, more preferably Rb, in a biological sample, and 
thus used in assays and diagnostic kits for the quantification of such binding proteins in bodily fluids, in 
tissue samples, and in mammalian cell cultures. Such assays may be particularly useful as diagnostic or 
prognostic tools in the detection and monitoring of a disorder linked to dysregulation of expression of a 
5 transcription regulator. Such assays may thus be very useful to asses the level of the tumor suppressor Rb 
in disorders including but not limited to developmental disorders, cancers such as retinoblastoma, prostate 
carcinoma, osteosarcoma, breast carcinoma and bladder carcinoma. The binding activity of the protein of 
the invention or part thereof may be assessed using any method familiar to those skilled in the art. 
Preferably, a defined quantity of the protein of the invention or part thereof is added to the sample under 

10 conditions allowing the formation of a complex between the protein of the invention or part thereof and 
the binding protein to be identified and/or quantified. Then, the presence of the complex and/or or the 
free protein of the invention or part thereof is assayed and eventually compared to a control using any of 
the techniques known by those skilled in the art. 

Another embodiment of the present invention relates to compositions and methods using the protein 

1 5 of the invention or part thereof or derivative thereof to block gene transcription either in vitro or in vivo. In a 
preferred embodiment, the protein of the invention or part thereof or derivative thereof is added in an 
effective amount to an in vitro culture to inhibit gene expression and thus cell proliferation using molecular 
biology techniques known to those skilled in the art allowing the import of the protein from the extracellular 
medium to the cell's nucleus. In another embodiment, eukaryotic cells are genetically engineered in order to 

20 express the protein of the invention or part thereof under specific conditions in order to prevent further 
proliferation of such cells upon demand such as infection, transformation, activation, differentiation, end of a 
production process. 

A preferred embodiment of the invention relates to compositions or methods using SEQ ID NO: 
406, SEQ ID NO: 165 or part thereof to diagnose, treat and/or prevent disorders including but not limited 

25 to disorders linked to dysregulation of gene transcription such as cancers and other disorders relating to 
abnormal cellular differentiation, proliferation, or degeneration, including hyperaldosteronism, 
hypocortisolism (Addison's disease), hyperthyroidism (Grave's disease), hypothyroidism, colorectal 
polyps, gastritis, gastric and duodenal ulcers, ulcerative colitis, and Crohn's disease; metabolic diseases 
such as obesity and a number of inflammatory diseases due to interleukin over-expression. For diagnostic 

30 and prognostic purposes, the expression of the protein of the invention could be investigated using any of 
the Northern blotting, RT-PCR or immunoblotting methods described herein and compared to the 
expression in control individuals. For prevention and/or treatment purposes, the protein of the invention 
may be overexpressed using any of the gene therapy methods known to those skilled in the art including 
those described herein. For example, expression of the protein of the invention can be upregulated by 
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infecting tumor cells with a retroviral or an adenoviral vector which expresses the desired protein at 
higher levels necessary for suppression of mutation in the Rb gene or in other oncogenic or tumor 
suppressor genes. 

Another related embodiment relates to the use of SEQ ID NO; 406, SEQ ID NO: 165, its 
5 complement, or any part thereof to develop antagonists of the protein of the invention. These antagonists 
could be antisense oligonucleotides, triple helices, ribozymes, small molecules or antibodies, especially 
neutralizing antibodies binding to the WD-repeats of the invention, and may be used to treat disease and 
conditions caused by abnormally low transcription. These conditions include accelerated aging 
syndromes such as Cochayne's syndrome, Ataxia telangiectasia and Werner's syndrome as well as age- 
10 associated diseases as well as "early onset" forms of diseases associated with old age such as dementia 
and Parkinson's disease. 

In another embodiment, the invention relates to methods and compositions using the protein of 
the invention or part thereof as a marker protein to selectively identify tissues, preferably colon and 
prostate. For example, the protein of the invention or part may be used to synthesize specific antibodies 
15 using any techniques known to those skilled in the art including those described therein. Such tissue- 
specific antibodies may then be used to identify tissues of unknown origin, for example, forensic samples, 
differentiated tumor tissue that has metastasized to foreign bodily sites, or to differentiate different tissue 
types in a tissue cross-section using immunochemistry. 

Protein of SEP ID NO: 414 (internal designation 188-27-3-0-G1-CS) 

20 The 389 amino-acid long protein of SEQ ID NO: 414, expressed in brain, fetal brain, placenta and 

testis, over-expressed in brain and encoded by the cDNA of SEQ ID NO: 173 is homologous to 
SIRTUIN-2 (SIRT2) (Trembl accession number: Q9Y6E9) and Silent Information Regulator 2-like 
protein (SIR2L) (Trembl accession number: Q9UNT0) that belong to the Silent Information Regulator 
type 2 (SIR2) family. In addition, the protein of the invention presents the Pfam signature for members of 

25 the SIR2 family (amino acids 84-268). Furthermore, the protein of the invention displays two 
characteristic motifs highly conserved among all members of the SIR 2 family that have been shown to be 
essential in the SIR2 silencing function (Moira M. et al., Genetics, 154:1069-1083 (2000)). These motifs 
are from positions 84 to 98 and from positions 165 to 170 of the protein of SEQ ID NO: 414 and 
correspond to GAG(I/V)SxxxG(I/V)PDFERS and (Y/I)TQNID patterns respectively. The protein of SEQ 

30 ID NO: 414 also has conserved cysteines residues at positions 195, 200, 221 and 224, covering a domain 
thought to be either a DNA-binding zinc-finger motif (Prodom prediction PD002659, from positions 195 
to 224) or an enzymatic domain (or an enzyme cofactor) (Moira M. et al., Genetics, 154:1069-1083 
(2000)). 
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The cDNA of SEQ ID NO: 173 encoding the protein of the invention differs from the one 
encoding the SIRT2 protein by a supplementary exon between positions 147 to 195. This exon modifies 
the initiation codon of the protein and extends the ORF in its N-terminal part by 16 amino acids. 
Moreover, amino residues in positions 20 and 2 1 of the protein of the invention (respectively an alanine 
5 and a glutamine residue) are substituted from a glutamine and a tyrosine residue (positions 4 and 5) of the 
SIRT2 protein. Thus, the protein of the invention is a new isoform of SIR2 resulting from alternative 
splicing. The protein of the invention of SEQ ID NO: 414 is also 37 amino acids longer than the SIRL2 

protein at its N-terminal end. 

Regulation of gene expression by alterations in chromatin structure is a universal mechanism in 

10 eukaryotic cells, responsible for maintaining patterns of gene expression throughout the development of 
multicellular organisms. Silencing has been studied most extensively in S. cerevisae (yeast). Among the 
SIR genes, SIR2 is the most evolutionarily conserved, and a number of genes with homology to SIR2 
have been identified (Frye R et al., Biochem. Biophys. Res. Commun., 273:793-798 (2000)). Presence of 
Homologues of SIR2 (HSTs) in organisms from bacteria to humans suggests that SIR2's silencing 

15 mechanism might be conserved. SIR2 was originally discovered to influence mating-type control in 
haploid cells by locus-specific transcriptional silencing. It has also been suggested that SIR2 and its 
homologs play additional roles in suppression of recombination, chromosomal stability, metabolic 
regulation, meiosis, and aging (for a review: see Gartenberg, Curr. Opin. Microbiol. 3:132-137 (2000)). 

Proteins of the SIR2 family are also thought to be either enzymes or enzyme cofactors. First, 

20 Landry and collaborators have shown that members of SIR2 family catalyze histone deacetylation in a 
reaction that requires NAD, thereby distinguishing them from previously characterized deacetylases. This 
enzyme is active on histone substrates that have been acetylated by both chromatin assembly-linked and 
transcription related acetyltransferases (Landry et al., Proc. Natl. Acad. Sci. 97:5807-5811 (2000)). 
Discovery of an intrinsic deacetylation activity for the conserved SIR2 family provides a mechanism for 

25 modifying histones and other proteins to regulate transcription and diverse biological process. Secondly, 
the study of a human SIR2 family member (hSirT2) was found to have a mono-ADP ribosylation activity 
in vitro (Frye R et al., Biochem. Biophys. Res. Commun., 260: 273-279 (1999)). Among potential 
substrates for mono-ADP ribosylation are histones and RNA Pol I, modification of which correlates with 
enhanced rDNA transcription. 

30 It is believed that the protein of SEQ ID NO: 414 or part thereof plays a role in gene silencing, 

suppression of recombination, chromosomal stability, metabolic regulation, meiosis, and aging, probably 
as a member of the SIR2 protein family. Particularly, the protein of the invention may deacetylate 
substrates, preferably acetylated histones and acetyltransferases, either directly or indirectly as enzyme 
cofactors. Particularly, the protein of the invention may have a ribosylation activity, preferably a mono- 
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ADP ribosylation activity, preferably on histones and RNA Pol I substrates, either directly or indirectly as 
enzyme cofactors. Additionally, the protein of the invention may be a DNA binding protein. Preferred 
polypeptides of the invention are polypeptides comprising the amino acids of SEQ ID NO: 414 from 
positions 1 to 16, 84 to 98, 165 to 170, 195 to 224, and 84-268 as well as fragments of SEQ ID NO: 414 
5 containing at least one cysteine residues located in positions 195, 200, 221 or 224 of SEQ ID NO: 414. 
Other preferred polypeptides of the invention are fragments of SEQ ID NO: 414 having any of the 
biological activities described herein. The deacetylation activity of the protein of the invention or part 
thereof may be assayed using any of the assays known to those skilled in the art including those described 
in Laundry et al., supra. The ribosylation activity of the protein of the invention or part thereof may be 

10 assayed using any of the assays known to those skilled in the art including those described in Frye et al, 
(1999). The nucleic acid binding activity of the protein of the invention or part thereof may be assayed 
using any of the assays known to those skilled in the art including those described in US patent 6,013,453. 

The invention relates to methods and compositions using the protein of the invention or part 
thereof to silence gene expression. In a preferred embodiment, the protein of the invention or part thereof 

15 or derivative thereof is added in an effective amount to an in vitro culture to inhibit gene expression and 
thus cell proliferation using molecular biology techniques known to those skilled in the art allowing the 
import of the protein from the extracellular medium to the cell's nucleus. In another embodiment, 
eukaryotic cells are genetically engineered in order to express the protein of the invention or part thereof 
under specific conditions in order to prevent further proliferation of such cells upon demand such as 

20 infection, transformation, end of a production process, differentiation, etc. . . 

The invention relates to methods and compositions using the protein of the invention or part 
thereof to deacetylate substrates, alone or in combination with other substances. Such substrates are 
acetylated substrates, preferably acetylated histones and acetyltransferases. For example, the protein of 
the invention or part thereof is added to a sample containing the substrate(s) in conditions allowing 

25 deacetylation, and allowed to catalyze the deacetylation of the substrate(s). In a preferred embodiment, 
the deacetylation is carried out using a standard assay such as those described in Laundry et al, supra. 
Deacetylated histones obtained by this method may be mixed with purified naked DNA (plasmid 
preparations for example) in order to reconstitute chromatine-like structures in vitro. Such structures are 
of great interest in the study of enzymatic factors involved in transcription and replication. 

30 Another embodiment of the present invention relates to composition and methods of using the 

protein of the invention or part thereof to develop assays for in vitro screening of inhibitors directed 
against the encoded deacetylase activity using any technique known to those skilled in the art including 
those described herein. Such deacetylase inhibitors are of great potential as new drugs due to their ability 
to influence transcriptional regulation and to induce apoptosis or differentiation in cancer cells. 
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Preferably, the protein of the invention, expressed in prokaryotic or eukaryotic systems according to 
methods known to those skilled in the art, may be mixed in vitro with a simple fluorescent substrate like 
an aminocoumarin derivative of an acetylated lysine, and different putative inhibitors. The coumarin 
derivative is then quantitated using a reverse-phase HPLC-system with a fluorescence detector. Such an 
5 approach has been previously developed by Hoffmann and collaborators (Hoffmann et al., Nucl. Acids 
Res. 27:2057-2058 (1999); Hoffmann et al., Pharmazie 55:601-606 (2000)). 

The invention relates to methods and compositions using the protein of the invention or part 
thereof to bind to nucleic acids, preferably DNA, alone or in combination with other substances. For 
example, the protein of the invention or part thereof is added to a sample containing nucleic acid in 

10 conditions allowing binding, and allowed to bind to nucleic acids. In a preferred embodiment, the protein 
of the invention or part thereof may be used to purify nucleic acids such as restriction fragments. In 
another preferred embodiment, the protein of the invention or part thereof may be used to visualize 
nucleic acids when the polypeptide is linked to an appropriate fusion partner, or is detected by probing 
with an antibody. Alternatively, the protein of the invention or part thereof may be bound to a 

15 chromatographic support, either alone or in combination with other DNA binding proteins, using 
techniques well known in the art, to form an affinity chromatography column. A sample containing 
nucleic acids to purify is run through the column. Immobilizing the protein of the invention or part 
thereof on a support advantageous is particularly for those embodiments in which the method is to be 
practiced on a commercial scale. This immobilization facilitates the removal of the protein from the batch 

20 of product and subsequent reuse of the protein. Immobilization of the protein of the invention or part 
thereof can be accomplished, for example, by inserting a cellulose-binding domain in the protein. One of 
skill in the art will understand that other methods of immobilization could also be used and are described 
in the available literature. 

Still another embodiment of the present invention relates to composition and methods of using the 

25 protein of the invention or part thereof to identify genes or regions of the human genome silenced by the 
protein of the invention or part thereof. Genomic DNA derived from patients with pathologies such as 
cancer and metabolic disorders, or from elderly people may be compared to those extracted from 
respective controls for their ability to bind the protein of the invention. As described previously, the 
protein of SEQ ID NO: 414 displays a putative zinc finger domain susceptible to bind DNA sequences 

30 near regions or genes to silence. The protein of the invention or part thereof may be bound to a 
chromatographic support, using techniques well known in the art, to form an affinity chromatography 
column. A sample containing a mixture of human genomic DNA digested by endorestriction enzymes is 
run through the column. After extensive washings the bound DNA is eluted and -further subcloned in 
classical cloning vectors known to those skilled in the art. Immobilizing the protein of the invention or 



240 



G-078US05DIV 



part thereof on a support is particularly advantageous for those embodiments in which the method has to 
be practiced routinely. This immobilization facilitates the removal of DNAs from the batch of resin 
coupled protein after binding, and allows subsequent re-use of the protein. Immobilization of the protein 
of the invention or part thereof can be accomplished, for example, by inserting any matrix-binding 
5 domain in the protein according to methods known to those skilled in the art. The resulting fusion product 
including the protein of the invention or part thereof is then covalently, or by any other means, bound to a 
protein, carbohydrate or matrix (such as gold, "Sephadex" particles, and polymeric surfaces). 

Another embodiment of the invention relates to methods of preparing antibodies directed against 
the protein of the invention or part thereof. Such antibodies may be used in co-immunoprecipitation 

10 procedures that enrich for chromatin fragments containing binding sites for the protein of the invention. 
This method may identify genes or regions of the human genome silenced by the deacetylase activity of 
the protein of the invention. Preferably, in samples containing fragments of native chromatin, antibodies 
directed against 414 and coupled to protein A or protein G sepharose beads are added to the mixture. 
Immunoprecipitation conditions are those known to those skilled in the art. After washings DNA 

15 fragments co-precipitated with 414 are extracted and further subcloned in routinely used cloning vectors. 
These DNA fragments are either sequenced and/or used as probes to screen genomic libraries. This 
procedure is very similar to the one used by Gould and collaborators to enriche for embryonic chromatin 
fragments containing sites for the homeotic Ubx protein (Gould et al., Nature 348:308-312 (1990)). 

A preferred embodiment of the invention relates to compositions or methods using SEQ ID NO: 

20 414, SEQ ID NO: 173 or part thereof to diagnose, treat and/or prevent develop disorders caused by the 
expression of "disease causing" genes. The number of pathologies and conditions that could be treated by 
the protein of the invention is potentially huge and unlimited. Favored disorders linked to dysregulation 
of gene transcription such as cancer and other disorders relating to abnormal cellular differentiation, 
proliferation, or degeneration, including hyperaldosteronism, hypocortisolism (Addison's disease), 

25 hyperthyroidism (Grave's disease), hypothyroidism, colorectal polyps, gastritis, gastric and duodenal 
ulcers, ulcerative colitis, and Crohn's disease; viral infection especially HIV and viral hepatitis (i.e. 
expression of viral proteins), metabolic diseases such as obesity and a number of inflammatory diseases 
due to interleukin over-expression. For diagnostic purposes, the expression of the protein of the invention 
could be investigated using any of the Northern blotting, RT-PCR or immunoblotting methods described 

30 herein and compared to the expression in control individuals. For prevention and/or treatment purposes, 
the protein of the invention may be overexpressed using any of the gene therapy methods known to those 
skilled in the art including those described herein. For example, switching off "disease" genes may be 
achieved by, for example, directly targeting the protein of the invention or part thereof to the genes (such 
as oncogenes in cancers) that are over-expressed in order to silence their expression. This could be 
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achieved by making a "chimera" protein in which the putative zinc-binding domain is replaced by a 
sequence known to bind to, or near the over-expressed gene as explained elsewhere in the application. 
Fusion proteins containing both the deacetylase activity and the specific DNA binding domain are 
obtained by methods of molecular biology well known to those skilled in the art. The corresponding 
5 eukaryotic expression vectors may be use in gene therapy in the cases of cancer, metabolic disorders, 
aging and any disorder where a gene is over-expressed. Such recombinant cDNA may be introduced in 
the well known adenoviral vectors used in cancer therapy (for a recent review on the use of replicative 
adenoviruses for cancer therapy : Alemany et al., Nat. Biotechnol. 18:723-727 (2000)). 

Another related embodiment relates to the use of SEQ ID NO: 414, SEQ ID NO: 173, its 

10 complement, or any part thereof to develop antagonists of the protein of the invention and of the SIR 
complex. These antagonists could be antisense oligonucleotides, triple helices, ribozymes, small 
molecules or antibodies and may be used to treat disease and conditions caused by abnormal gene 
silencing. These conditions include accelerated aging syndromes such as Cochayne's syndrome, Ataxia 
telangiectasia and Werner's syndrome as well as age-associated diseases as well as "early onset" forms of 

15 diseases associated with old age such as dementia and Parkinson's disease. 

In another embodiment, the invention relates to methods and compositions using the protein of 
the invention or part thereof as a marker protein to selectively identify tissues, preferably brain tissues. 
For example, the protein of the invention or part may be used to synthesize specific antibodies using any 
techniques known to those skilled in the art including those described therein. Such tissue-specific 

20 antibodies may then be used to identify tissues of unknown origin, for example, forensic samples, 
differentiated tumor tissue that has metastasized to foreign bodily sites, or to differentiate different tissue 
types in a tissue cross-section using immunochemistry. 

Protein of SEP ID NO:298 (182-1-2-0-D12-CS) 

The protein of SEQ ID NO:298, encoded by the cDNA of SEQ ID NO:57, is homologous to 
25 proteins of the fibroblast growth factor family (FGF). Specifically the amino acid sequence of SEQ ID 
NO:298 is identical to the recently described FGF-23. The protein of the invention is strongly expressed 
in the fetal liver. 

The protein of the invention presents the pfam signature for fibroblast growth factors (positions 
48 to 129). High resolution X-ray structures of crystals of both FGF-1 and FGF -2 have been reported and 
30 reveal a "beta trefoil" topology, comprising 12 strands linked to form a three-fold symmetrical structure 
made up of four-stranded antiparallel beta sheet (see Faham S. et al. - Curr Opin Struct Biol. - 1998, 8(5): 
p578-586). On the basis of sequence conservation, it seems very likely that all members of the FGF 
family have related 3-dimesional structures. Preferred polypeptides of the invention are those that 



242 



G-078US05DIV 



comprise amino acids 39 to 45; 51 to 56; 60 to 64; 71 to 77; 82 to 87; 92 to 97; 101 to 105; 113 to 119; 
124 to 130; 142 to 147; 151 to 155 and/or 167 to 172, which by homology with other members of the 
FGF family make up the 12 beta pleated sheets characteristic of the FGF family (White K. et al. - Nat 
Genet. - 2000; 26(3): pp. 345-348). Furthermore, as within these regions a number of amino acids from 
5 SEQ ID NO:298 are conserved in over 80% of human FGFs (after sequence alignment), the most 
preferred polypeptides of the invention comprise amino acids 42, 53, 63, 83, 85, 87, 93, 96, 101, 113, 
115, 124, 127 and/or 129. Other preferred polypeptides of the invention are any fragment of SEQ ID 
NO:298 having any of the biological activities described herein. 

Cytokines are a heterogeneous group of polypeptide mediators associated with numerous 

10 functions, including immune system and inflammatory responses. The cytokine families include, but are 
not limited to, Interleukins, Chemokines, Tumor necrosis factors, Interferons, Colony stimulating factors, 
Neurotrophic, neuropoietins and growth factors (of which the FGF family is a member). Fibroblast 
growth factors (FGFs) were first characterized, in the mid 1970s, as mitogens of cultured fibroblasts. 
Since then more then 20 different FGFs have been identified. Fibroblast growth factors belong to a 

15 family of proteins called growth factors (other members of this family include EGF, PDGF, TGFs and 
ECGF). The biological effects of FGFs are mediated by association with 3 biochemically distinct 
partners: heparan sulfate proteoglycans, a low affinity transmembrane FGF-binding protein and high- 
affinity transmembrane FGF receptors of the tyrosine -kinase class. Transfection and reconstruction 
experiments have shown that intracellular signal transduction is triggered by activation of FGF receptor 

20 kinase activity. Activation is brought about by receptor oligomerization, which is mediated by the 
association of heparan sulfate proteoglycans with the ligand (FGF) and of the ligand with the receptor 
itself (Faham S. et al. - Curr Opin Struct Biol. - 1998, 8(5): p578-586). Longer heparin-derived 
oligosaccharides generally exhibit tighter binding to FGF. The relationship between heparin length, 
biological activity and FGF binding has been extensively studied and there is general agreement that 

25 longer heparin oligosaccharides tend to be more biologically active. FGFs are members of a family with 
a broad range of biological activities involving cell growth and differentiation (including angiogenesis, 
morphogenesis and wound healing); cell survival, replication, adhesion and mobility. FGFs have been 
found to be potent growth factors for a number of cell types including, but not limited to fibroblasts, 
endothelial cells, smooth muscle cells, keratinocytes, osteoblasts and neurons. 

30 Clearly, FGF biology is potentially very complex, involving multiple ligands, receptors and 

cofactors, each expressed with different spatial and temporal patterns and distinct kinetics in the course of 
normal development. Considerable efforts have been expended on the creation of different types of 
animal models for the analysis of FGF function in vivo. These studies clearly indicate that FGF signaling 
is involved in a number of different processes at different stages of development and is critical in early 
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developmental stages (FGF-4 and FGF receptor 1 homozygous null mutations both cause early lethality). 
FGF signaling has been found to be required for both branching morphogenesis of the lung and the 
establishment of the normal program of keratinocyte differentiation in the skin. FGF signaling has also 
been found to be involved in both the initial induction and sustained outgrowth of the limb bud during 
5 early limb development. Perhaps the most impressive illustration of this function of FGF signaling is the 
ability to induce supernumerary limb development in the chick by local application of FGF-soaked beads 
(Cohn M, et al. - Cell - 1995; 80: p739-746), thus indicating that a least some FGF-dependent processes 
are regulated by accessibility of an FGF ligand. FGFs are also capable of stimulating migration and 
differentiation of hepatic precursors. 

10 Recently mutations in the FGF -23 gene were found to be associated with autosomal dominant 

hypophosphataemic rickets (ADHR), a genetically transmitted disease characterized by low serum 
phosphorus concentrations, rickets, osteomalacia, lower extremities deformation, short stature, bone pain 
and dental abscesses. It seems very likely that FGF signaling functions are involved in numerous aspects 
of morphogenesis, differentiation and other essential cellular mechanisms, and are thus likely involved in 

15 any of a large number of diseases and conditions associated with these processes. 

Thus, it is believed that the protein of SEQ ID NO:298 is a member of the fibroblast growth 
factor family, and is thus involved in a large number of cellular and organismal proesses, including, but 
not limited to, cell growth and differentiation, angiogenesis, morphogenesis, wound healing, cell survival, 
replication, adhesion and mobility. 

20 One embodiment of the present invention relates to the use of the present polypeptides and 

polynucleotides to identify liver, heart, thyroid and parathyroid tissues, or cells derived from these tissues, 
since the protein of the invention is expressed therein (White K. et al. - Nat Genet. - 2000; 26(3): p345- 
348). Such detection of cells expressing the protein can be carried out in any of a number of ways, 
including the use of specific antibodies or antiserum generated against the protein using standard 

25 methods, as well as using polynucleotide probes specific for nucleic acids encoding the protein of the 
invention. 

In another embodiment, the protein of the invention or part thereof can be used as a mitogen to 
stimulate the growth of a number of different cells types including, but not limited to, fibroblasts, muscle 
cells, osteoblasts, keratinocytes and hepatocytes. The growth of cells can be stimulated in vitro, for 
30 example to promote the growth of cells cultured for the synthesis of recombinant proteins, or for ex vivo 
gene therapy applications. Another preferred application of this technique relates to the use of the protein 
of the invention or part thereof to generate in vitro tissues and organs including, but not limited to, skin, 
cartilage, and bone for transplants and grafts (Lancet - 1981, 1(821 l):75-8)). 



244 



G-078US05DIV 



Another preferred embodiment of the invention relates to the use of the invention or part thereof 
to treat damaged tissues and organs. Members of the FGF family have been shown to induce the 
differentiation and growth of a number of different cell types. Thus the protein of the invention can be 
administered to treat pathologies and conditions that result from damage to cells, tissues or organs. These 
5 pathologies and conditions include but are not limited to bone fractures and bone defects Kimoto et al. - J 
Dent Res - 1998, 77(12): pl965-1969) (Solheim E - Int Orthop - 1998, 22(6), damage due to wounds 
(such as lesions of the skin and ulcers) (Debus E. - Zentralbl Chir - 2000, 125 (supple 1): p49-55) (Szabo 
S. - Aliment Pharmacol Ther - 2000, 14(Suppl 1): p33-43), tissue damage due to ischemia (for example, 
in the brain and heart) (Simons M. - Circulation - 2000, 102(11): pE73-E86), cardiovascular diseases 
10 such as thrombosis and atherosclerosis (Bauters C - Drugs - 1999, 58 (Spec Nol): pi 1-15) and 
neurodegenerative diseases due to neuronal loss such as Parkinson's disease or Alzheimer's disease 
(Ebadi M - Neurochem Int - 1997, 30(4-5): p347-374) (Brundin P. - Cell Transplant - 2000, 9(2): pi 79- 
195). 

In a most preferred embodiment, the polypeptides or polynucleotides of the invention can be used 

15 to diagnose, treat, or prevent disorders resulting from non-functional and/or mutated FGFs, such as 
Autosomal dominant hypophosphataemic rickets, which is associated with mutation of certain amino 
acids of FGF-23 (White K. et al. - Nat Genet. - 2000; 26(3): p345-348, which is hereby incorporated by 
reference in its entity. Such disorders can be treated, for example, by administering a therapeutically 
effective amount of the protein of the invention or a polynucleotide sequence encoding the protein to a 

20 patient suffering from the disorder. Similarly, SEQ ID NO:298 or SEQ ID NO:57 or any part thereof can 
be used to develop diagnostic kits in order to diagnose, prevent and/or treat any other disease associated 
with FGF, for example pathologies associated with FGF overexpression. 

In yet another embodiment, the protein of the invention or part thereof can be used to develop 
antogonists of FGF and/or FGF receptors in order to treat disorders associated with an over-activation of 

25 FGF pathways (for example, due to over production of FGF or overstimulation of FGF receptors). This is 
particularly true for pathologies such as cancers where some tumors secrete large quantities of FGF, such 
as prostate and breast cancers. Furthermore FGF antagonists can be useful in inhibiting tumor 
angiogenesis, which is an essential step in tumor growth. In the same way SEQ ID NO:57 or any part 
thereof could be used to generate antisense oligonucleotides. Antisense oligonucleotides block 

30 complementary mRNA, thus inhibiting the synthesis of the protein encoded by the mRNA. These 
oligonucleotides can be used in in vivo or ex vivo treatment of the diseases caused or aggravated by 
overexpression of FGFs. 
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Protein of SEP ID No: 396 (internal designation: 160-12-1 -0-D10-CS) 

The protein of SEQ ID No: 396 encoded by the cDNA of SEQ ID No: 155, overexpressed in 
brain and fetal brain, shows homology to memebers of the transmembrane 4 super family of proteins 
(TM4SF). Tthe protein of the invention displays signatures characteristic of this family, namely the pfam 
5 domain from positions 66 to 273, the Prosite motif from positions 112 to 134 as well as the emotif 
domains from positions 108 to 127, 108 to 146, 129 to 150, 128 to 154, and 247 to 274. In addition, the 
protein of the invention has several predicted transmembrane domains: 103 to 123, 130 to 150 and 245 to 
265, with an additional predicted domain with lower certainty from positions 61 to 81. The protein of the 
invention has significant homology to a TM4SF member, the integral membrane CD81 antigen also 

10 known as TAPA1 (Target of Antiproliferative Antibody), except for its N-terminus. The transmembrane 
domains of the protein of the invention matches those described for CD81. 

Members of the tetraspan family of proteins are associated with adhesion molecules and translate 
adhesive events into a regulation of cellular behaviour. TAPA-1 is a widely expressed protein found to 
influence adhesion, morphology, activation, proliferation and differentiation of B, T and other cells. 

15 TAPA-1 has two long hydrophilic domains of the molecule which are extracellular and located between 
four TM (Transmembrane region, TM1-4). The region between TM2 and TM3 is highly conserved in all 
tetraspanins. The protein is highly hydrophobic and contains a potential N-myristoylation site. TAPA1 
functions by forming a complex on the cell surface and the antigenic epitope of the human TAPA1 is 
contained within a subregion of the second extracellular domain of the protein. Cell-surface expression of 

20 TAPA1 can be down-modulated by binding of antibodies (Levy 1991, J Biol Chem Aug 
5;266(22):14597-602). 

Mice lacking CD81 (not expressing TAPA1) have impaired antibody responses to protein 
antigens. This defect is specific to antigens that preferentially stimulate a T helper 2 response and is only 
seen with T cell-dependent antigens. Absence of CD81 on B cells is sufficient to cause the defect. 

25 Antigen-specific interleukin (IL) 4 production is greatly reduced in the spleen and lymph nodes of CD81- 
null mice compared with heterozygous littermates. The expression of CD81 on B cells is critical for 
inducing optimal EL-4 and antibody production during T helper 2 responses. CD81 is likely to have a 
greater role in the control of immune responses than in the development of immune cells (Maecker (1997) 
J Exp Med 1997 Apr 21; 185(8): 1505-10). CD81 on B cells has the capacity to promote IL-4 secretion 

30 from T cells. Costimulatory proteins such as B7-1 and B7-2 have been shown in some systems to have 
differential effects on cytokine secretion by T cells. CD81 on B cells, can control cytokine production by 
T cells. TAPA-1 has been implicated to play an important role in the regulation of lymphoma cell 
growth. 
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TAPA-1 is highly expressed in many neurons of the brainstem. TAP A is found in all glial cells, 
and the level of this protein correlates with their maturation (Sullivan et al., 1998, J Comp Neurol 1998 
Jul 6;396(3):366-80). This protein is expressed by ependyma, choroid plexus, astrocytes, and 
oligodendrocytes. TAPA1 is dramatically upregulated during early postnatal development, at the time of 
5 glial birth and maturation. At embryonic day 18, the levels of TAPA are low, with most of the 
immunoreaction product being associated with the ependyma, choroid plexus, and the glia limitans. The 
amount of TAPA expressed in the brain increases with brain development, and at postnatal day 14 the 
protein levels approach those of the adult. This increase in the levels of TAPA at postnatal day 14 is due 
to upregulation in the gray matter and white matter. TAPA has been associated with reactive gliosis and 
10 the glial scar. The spatiotemporal expression pattern of CD81 by reactive microglia and astrocytes 
indicates that CD81 is involved in the glial response to spinal cord injury. It is suggested that the 
upregulation of TAPA is an integral component of glial scar formation (Peduzzi et al, Exp Neurol. 1999 
Dec;160(2):460-8). 

The levels of TAPA-1 are low in metastatic prostate tumors, expression of this protein in these 

15 cells appears to suppress metastatic behavior (Dong et al., 1995 Science. 12;268(5212):884-6.). Bivalent 
antibodies directed against these proteins can be used to enhance adhesion of different cell types: pre-B 
cells (Masellis-Smith and Shaw (1994) J Immunol 1994 Mar 15;152(6):2768-77), endothelial cells 
(Forsyth, 1991 Immunology Feb;72(2):292-6), and tumor cell mobility and invasiveness (Miyake et al., 
1991 J Exp Med. Dec 1; 174(6): 1347-54.). In the nervous system, the migratory behavior of Schwann 

20 cells over biologically relevant substrates can be enhanced with the application of antibodies directed 
against certain TM (Anton et al (1995) J Neurosci. Jan;15(l Pt 2):584-95). Antibodies directed against 
TAPA-1 depress the mitotic activity and induce an increase in cellular adhesion (Oren et al, 1990 Mol 
Cell Biol Aug;10(8):4007-15). 

It is believed that the protein of SEQ ID NO: 396 or part thereof plays a role in cell adhesion, 

25 motility, metastasis, cell activation, signal transduction and the immune response, probably as a member 
of the TM4SF family. As a member of the tetraspanin family of proteins, the protein of SEQ ID No: 396 
or part there of is believed to mediate cellular interaction in lymphoid cells as well as non- 
hematolymphoid tissue to affect cell adhesion and migration, alter cell morphology and the activation 
state of a cell. Preferred polypeptides of the invention are polypeptides comprising the amino acids of 

30 SEQ ID NO: 396 from positions 66 to 273, 112 to 134, 108 to 127, 108 to 146, 129 to 150, 128 to 154, 
and 247 to 274. Other preferred polypeptides of the invention are fragments of SEQ ID NO: 396 having 
any of the biological activity described herein. The activity of the protein of the invention or part thereof 
may be assayed using any of the assays known to those skilled in the art including those describing a 
functional tissue assay used to define surface antigens regulating astrocyte growth (Eldon et al, 1996, J 
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Neurosci, 16(17):5478); cellular function assays determining the involvment of the protein in signal 
transduction and cell adhesion in the immune system (Levy et al, 1998 Ann. Rev. Immunol. 16:89-109, 
Virtaneva et al, 1994 Immunogenetics 39: 329-334). 

An embodiment of the present invention relates to methods of using the protein of the invention 
5 or part thereof to identify and/or quantify membrane proteins, preferably integrins, lineage specific 
molecules, tetraspanins, and antibodies, in a biological sample, and thus used in assays and diagnostic kits 
for the quantification of such membrane proteins in tissue sample, and in mammalian cell cultures. The 
binding activity of the protein of the invention or part thereof may be assessed using the assay described 
in Shoshana et al, 1998, Annu. Rev. Immunol 16: 89-109; Maecker et al, 1998 PNAS 95: 2458-2462; 

10 Geisert et al, 1996, J of Neuroscience 16(17): 5478-5487 or any other method familiar to those skilled in 
the art. Preferably, a defined quantity of the protein of the invention or part thereof is added to the sample 
under conditions allowing the formation of a complex between the protein of the invention or part thereof 
and the membrane protein to be identified and/or quantified. Then, the presence of the complex and/or or 
the free protein of the invention or part thereof is assayed and eventually compared to a control using any 

15 of the techniques known by those skilled in the art. 

In another embodiment, the invention relates to compositions and methods using the protein of 
the invention or part thereof to stimulate cell proliferation, preferably proliferation of lymphoid cells both 
in vitro and in vivo. For example, soluble forms of the protein of the invention or part thereof may be 
added to cell culture medium in an amount effective to stimulate cell proliferation. 

20 In another embodiment, the invention relates to compositions and methods using the protein of 

the invention or part thereof or derivative thereof to stimulate antibody production either in vitro or in 
vivo. In a preferred embodiment, the protein of the invention or part thereof or derivative thereof may be 
added in an effective amount to stimulate antibody production to an in vitro culture of antibody-producing 
cells, such as hybridomas. In another preferred embodiment, the protein of the invention or part thereof 

25 or derivative thereof may be injected into an animal in order to increase the animal's antibody production 
to a protein of interest in the case of production of polyclonal antibodies. 

In still another embodiment, the invention relates to compositions and methods using the protein 
of the invention or part thereof or derivative thereof to decrease cell adhesion either in vitro or in vivo. In 
a preferred embodiment, the protein of the invention or part thereof or derivative thereof may be added in 

30 an effective amount to prevent and/or inhibit cell adhesion to an in vitro culture of adherent cells in order 
to recover those adherent cells. 

In still another embodiment, , the invention relates to compositions and methods using the protein 
of the invention or part thereof to treat and/or prevent cell-proliferative disorders, such as cancers,, via the 
prevention of metastatispreferably brain cancer, and disorders characterized by depressed immune 
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response such as autoimmune diseases AIDS, allergy, typel diabetes, systemic lupus erythematosus, 
chronic rheumatoid arthritis, juvenile rheumatoid arthritis, Sjogren's syndrome, systemic scleriasis, mixed 
connective tissue disease and dermatomyositis, Hashimoto's disease, primary myxedema, thyrotoxia, 
pernicious anemia, ulcerative colitis, autoimmune atrophic gastritis, idiopathic Addison's disease, male 
5 infertility, Goodpasture's syndrome, acute progressive glomerular nephritis, myasthenia gravis, multiple 
myositis, pemphigus vulgaris, bullous pemphigoid, sympathetic ophthalmia, multiple sclerosis, 
autoimmune hemolytic anemia, idiopathic thrombocytopenic purpura, postmyocardial infarction 
syndrome, rheumatic fever, lupoid hepatitis, primary biliary cirrhosis, Behcet's syndrome and Crest's 
syndrome, via the stimulation of antibody production and IL-4 secretion. 

10 In another embodiment, the invention relates to methods and compositions using the protein of 

the invention or part thereof as a marker protein to selectively identify tissues, preferably from brain and 
fetal brain origin. For example, the protein of the invention or part may be used to synthesize specific 
antibodies using any techniques known to those skilled in the art including those described therein. Such 
tissue-specific antibodies may then be used to identify tissues of unknown origin, for example, forensic 

15 samples, differentiated tumor tissue that has metastasized to foreign bodily sites, or to differentiate 
different tissue types in a tissue cross-section using immunochemistry or any other technique known to 
those skilled in the art. 

Protein of SEP ID No: 296 (internal designation 181-3-3-0-B8-CS) 

The protein of SEQ ID NO: 296 encoded by the cDNA of SEQ ID NO:55, overexpressed in fetal 
20 liver, is homologous to the whole domain IV-4 and IV-5 of the basement membrane-specific heparan 
sulfate proteoglycan core protein (perlecan), well conserved among C.elegans, mice and human 
(accession numbers Q06561, Q05793 and P98160 respectively). The 247-amino-acid-long protein of the 
invention, displays two putative hydrophobic stretches from positions 44 to 64 and 219 to 239 and a 
putative immunoglobulin domain from positions 141 to 197, homolog to the Ig domain 4 of the Ig repeat 
25 structure of domain IV of perlecan proteins. The protein of the invention displays also a putative secreted 
signal peptide from positions 6 to 21 . 

Basement membranes are specialized regions of extracellular matrix (ECM) containing a large 
number of different components, including laminin, collagen, nidogen and heparan sulfate proteoglycans 
(for a review see Bernfield at al., Annu. Rev. Biochem. 68:729-777 (1999)). Perlecan, a major basement 
30 membrane, plays important roles in many fundamental development and regenerative processes, 
including cell cohesion, adhesion and migration, signal transduction, and even gene regulation (Martin 
and Timpl, Annu. Rev. Cell Biol. 3:57-85 (1987)). The cDNA sequence of perlecan encodes a large core 
protein consisting of five structural domains, referred from I to V, with distinct motifs such as SEA 
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modules (domain I), LDL class A modules (domain II), cysteine-rich LE modules (domain III), LAMB 
modules (domain V). Domain IV consists of Ig-like repeats (14 in mice, 21 in human perlecan) similar to 
those of neural cell adhesion molecules (N-CAMs). Glycosaminoglycan chains are mostly linked to 
domain I and have been shown to participate with the core protein in its differential expression in tissues 
5 and development stages (Perrimon and Bernfield, Nature 404:725-728 (2000)). 

The N-terminal fragment IV containing Ig modules from 1 to 8 show high-affinity binding to the 
two known nidogen isoforms, lamininl-nidogenl complex (LN) and binding to heparin at physiological 
ionic strength (Hopf et al., Eur. J. Biochem. 259:917-925(1999)). An alteration study of the C. elegans 
perlecan show in vivo that mutations in Ig-like modules 3 and 4 induce a lethal phenotype by inhibiting 

10 the spatial distribution of the splice variants. Mutations inducing deletions in other Ig-like modules of 
perlecan domain IV does neither affect the isoform expression nor the spatial distribution, suggesting a 
crucial role of Ig-like modules 3 and 4 in muscle assembly and development stages (Mullen et aL, Mol. 
Biol Cell 10:3205-3221 (1999)). 

Several studies have shown the large presence of distinct perlecan isoforms through regulated 

15 alternative splicing in C. elegans (Rogalski et al, Genes Dev. 7:1471-1484 (1993), Rogalski et al, 
Geneticsl39:159-169(1995)). Although splice variants have not yet been shown in human, Ig-like 
modules are encoded by multiple exons compatible with different combinatorial possibilities of 
expression (Cohen et al., P.N.A.S. 90:10404-10408 (1993)). Alternative splicing within Domain IV is 
associated with temporal and spatial differences in isoform expression. A subset of C.elegans isoforms are 

20 associated with body-wall muscles during embryiogenesis and are required for nematode myofilament 
lattice assembly, which is very similar to assembly of focal adhesions in mammalian cell culture 
(Moerman and Fire, CSH labo. Press (1997)). 

Basement membrane-like structure containing perlecan, collagen IV, laminin also plays a major 
role during liver differentiation by interacting with immature hepatocytes (Am. J. Path. 142: 199-208 

25 (1993)). 

Perlecan have been implicated in a number of processes and diseases resulting from the alteration 
of its structure including glomerular filtration deficiencies such as proteinuria, diabetic glomerulopathies, 
nephrotic syndromes, Denys-Drash syndromes (Groffen at al., Nephrol. Dial. Transplant 14:2119-2129 
(1999)), mitogenesis and angiogenesis diseases (Aviezer et al., Cell 79:1005-1013 (1994)), inflammation 
30 and tissue repair, ocular and skeletal defect syndromes, microbial pathogenesis through invasion. Perlecan 
core protein has binding epitopes for the basement membrane proteins nidogen- 1, nidogen-2, and fibulin- 
2, as well as for Alzheimer's beta-amyloid protein (Snow et al.,Arch. Biochem. Biophys. 320:84-95 
(1995)) and platelet growth factor. 
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It is believed that protein of SEQ ID NO: 296 or part thereof is a membrane basement-like 
protein, preferably a human isoform of the perlecan protein. Thus, the protein of the invention plays an 
important role in membrane integrity and interactions with other basement proteins and particularly with 
nidogen-1 and 2, LN complexes and heparin coumpounds. Besides, the protein of the invention is 
5 thought to participate in the interactions with cellular receptors such as integrins, with cytokine release 
and proteolysis, with regulation of angiogenesis , wound healing and tumor invasion. Being 
overexpressed in the fetal liver, the protein of the invention is thought to participate in the differentiation, 
migration and adhesion of hepatocytes trought its spatial and temporal expression during embryogenesis. 
Preferred polypeptides of the invention are polypeptides comprising the amino acids of SEQ ID NO: 296 

10 from positions 141 to 197. Other preferred polypeptides of the invention are fragments of SEQ ID NO: 
296 having any of the biological activity described herein. The activity of the protein of the invention or 
part thereof may be assayed using any of the assays known to those skilled in the art including those 
described in Hopf et al., Euro. J. Biochem. 259:917-925(1999) for binding assays with other membrane 
proteins and in Rescan et al., Am. J. Path. 142:199-208 (1993) for protein assays (Immunochemistry and 

15 ELISA). 

In one embodimen, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a new marker protein to selectively identify embryogenic stages, preferably in 
liver tissues. For example, the protein of the invention or part thereof may be detected using specific 
antibodies able to bind to the protein using any technique known to those skilled in the art. Such tissue- 

20 specific antibodies may then be used to identify embryogenic cells with dysregulated membrane 
components such as in differentiated tumor cells or to differentiate different cell types in a tissue cross- 
section using immunochemistry. For example, the amount of the protein of the invention in embryogenic 
cells reflecting the characterized overexpression activity is measured and compared to that of a normal 
cell using a specific antibody detected by fluorescence (FACS, confocal microscopy,...) or any other 

25 detection methods skilled in the art. 

In another embodiment the invention relates to methods and compositions using the protein of the 
invention or part thereof for the diagnosis of a disorder associated with overexpression of the protein of 
the invention, preferably but not limited to perlecan associated tumors such as human melanoma, 
proliferative diseases, glomerular filtration deficiencies such as proteinuria, diabetic glomerulopathies, 

30 nephrotic syndromes, Denys-Drash syndromes, mitogenesis and angiogenesis diseases, inflammation and 
tissue repair, ocular and skeletal defect syndromes, microbial pathogenesis through invasion. The 
expression of the protein of the invention could be investigated using any methods well known to those 
skilled in the art, including Northern blotting, RT-PCR or immunoblotting using specific antibodies 
binding to the protein of the invention. 
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In still another embodiment the protein of the invention or part thereof could be used as a mitogen 
to stimulate the growth and differentiation of a number of different cells types including but not limited to 
fibroblasts, muscle cells, osteoblasts, keratinocytes and hepatocytes. In a preferred embodiment, the 
protein of the invention or part thereof is used in in vitro cultures such as those used for synthesis of 
5 recombinant proteins. The protein of the invention or part thereof is added to the culture in an amount 
effective to stimulate proliferation and/or differentiation. A more preferred application of this technique 
relates to the use of the protein of the invention or part thereof in generating or reparing in vitro tissues 
and organs such as but not limited to skin, cartilage, and bone for transplants and grafts (Lancet - 1981, 
1(821 1):75-8), which disclosure is hereby incorporated by reference in its entirety). 

10 In another embodiment, an antagonist of the protein of SEQ ID NO: 296 may be administered to a 

subject to treat or prevent a cell proliferative disorder. Such disorders may include, but are not limited to, 
arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), 
myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary 
thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, 

15 sarcoma, teratocarcinoma, and, in particular, cancers of the adrenal gland, bladder, bone, bone marrow, 
brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, 
pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus. In 
one aspect, an antibody which specifically binds to the protein of the invention may be used directly as an 
antagonist or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells 

20 or tissue which express the protein of the invention. In another example, antisense nucleotides, triple 
helices, Genetic Suppressor Elements (GSE), ribozymes designed from nucleotides encoding the protein 
of the invention or part thereof using any methods to those skilled in the art are administrated to inhibit 
the expression of the protein of the invention. 

' Protein of SEQ ID NO: 410 (internal designation 179-9-4-0-B8-CS) 

25 The protein of SEQ ED NO: 410 encoded by cDNA of SEQ ID NO: 169 found in fetal kidney is 

homologous to the proteins of ankyrin family protein and the proteins containing a characteristic ankyrin 
repeated motif (pfam accession number : PF00023). The protein of the invention shows homology with 
human ankyrin proteins (PIR accession number A35049 ; SP-TREMBL accession number : Q99407 )), 
ankyrins from several different eucaryote species {Drosophila melanogaster : STR accession number 

30 Q9VAU5 ; mouse : STR accession number Q61302 and SwissProt accessiion number Q02357 ; cow : 
STR accession number AAF61702 ; Arabidopsis thaliana : STR accession number Q9ZQ79 ) and 
ankyrins from procaryote species {Paramecium bursaria Chlorella virus : STR accession number STR 
Q41164). 
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In addition, the protein of the invention shows homology with other proteins containing ankyrin 
repeat motif. The ankyrin repeat motif is a 33 amino acid motif and has an L-shaped structure consisting 
of two alpha helices following the beta hairpin loop (Gorina et al., Science.274-1005 (1996)). Examples 
of proteins comprising ankyrin repeats include: channels , enzymes toxins , transcription factors (Palek et 
5 al., Semin. Hematol. 27:290-332 (1990)), tankyrase (Smith et ah, Science.282: 1484-1487 (1998)) , 
multiple proteins involved in signal transduction, in particular integrin-linked kinases (Huang et al., Int. 
Mol. Med. 3:563-572 (1999)), inhibitors of cyclin-dependent kinases (Baumgartner et al., Structure. 
6:1279-1290 (1998)), death-associated protein kinase involved in apoptosis (Raveh et al., Proc. Nath. 
Acad. USA. 15:1572-1577 (2000)) and many others. 

10 The ankyrin motif is also found in the protein of the invention (position 47 to 79). 

Ankyrins are peripheral membrane proteins which have been found in erythrocyte, kidney and 
neuronal cells of mammals. Cells contain a cytoskeleton that links intracellular compartments with each 
other and the plasma membrane. Associations between the cytoskeleton and the lipid membranes 
bounding these compartments involve spectrin, ankyrin, and integral membrane proteins. Spectrin is a 

15 major component of the cytoskeleton and acts as a scaffolding protein. Similarly, ankyrin acts to tether 
the actin-spectrin moiety to membranes and to regulate the interaction between the cytoskeleton and 
membranous compartments. Different ankyrin isoforms are specific to different organelles and provide 
specificity for this interaction. Genes coding for three different mammalian ankyrins (ankyrin R , ankyrin B 
and ankyrin G ) have been cloned. Ankyrin R was originally identified as part of the erythrocyte membrane 

20 skeleton, and was recently also localized to the plasma membrane of a subpopulation of post mitotic 
neurons in rat brain (Lambert, et al., 1993, J. Neurosci., 13, 3725-3735). Ankyrin B is a developmentally 
regulated human brain protein which has two alternatively spliced isoforms with molecular masses of 220 
kilodaltons (kD) and 440 kD (Kunimoto, et al., 1991, J. Cell Biology, 115, 1319-1331). Ankyrin G is a 
more recently isolated human gene that encodes two neural-specific ankyrin variants (480 kD and 270 

25 kD), which have been localized to the axonal initial segment and node of Ranvier (Kordeli, et al., 1995, J. 
Biol. Chem., 270, 2352-2359). Studies on mammalian ankyrins indicate that ankyrins bind a variety of 
proteins which have functions involved with the anion exchanger (Drenckhahn, et al., 1988, Science, 230, 
1287-1289), Na+/K+-ATPase, amiloride-sensitive sodium channel in kidney (Smith, et al., 1991, Proc. 
Natl. Acad. Sci. U.S.A., 88, 6971-6975), voltage dependent sodium channel of the brain and the 

30 neuromuscularjunction (Srinivasan, et al., 1988, Nature, 333, 177-180), and nervous system cell adhesion 
molecules (Davis, et al., 1994, J. Biol. Chem., 269, 27163-27166). 

Analyses of mammalian ankyrins have revealed that these large proteins are divided into three 
functional domains. These include an N-terminal membrane-binding domain of about 89-95 kD, a 
spectrin-binding domain of about 62 kD, and a C-terminal regulatory domain of about 50-55 kD. The 
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membrane-binding domain is primarily comprised of tandem repeats of about 33 amino acids each. This 
domain usually has about 22-24 copies of these repeats. The repeat units appear to function in binding to 
membrane proteins such as anion exchangers, sodium channels, and certain adhesion molecules. The 
spectrin-binding domain, as the name implies, functions in binding to the spectrin-based cytoskeleton of 
5 cells positioned inside the plasma membrane. Finally, the regulatory domain, which is the most variant 
domain among the different ankyrins that have been studied, appears to function in as a repressor and/or 
an activator of the protein-binding activities of the other two domains. Some of the variability seen in this 
domain among different ankyrin species appears to be the result of alternative splicing of nascent 
transcripts. The regulatory domain can respond to cellular signals, allowing remodeling of the 

10 cytoskeleton during the cell cycle and differentiation (Lambert, S. and Bennett, V. (1993) Eur. J. 
Biochem. 211:1-6). Ankyrin may be target for action of parasites. Erythrocyte ankyrin is cleaved by 
parasite proteases of Plasmodium falciparum destabilizing erythrocyte membrane skeleton which 
facilitates parasite release (Raphael et al., Mol Biochem Parasitol. 110(2):259-272 (2000)). Recently, 
novel ankyrin proteins have been isolated from Dirofilaria and Brugia which may be useful in protecting 

15 animals, including humans, from diseases caused by parasitic helminths (United States Patent No. 
6,063,599). 

Ankyrin sequences have been identified in various libraries, at least 50% of which are associated 
with cancer and at least 23% of which are associated with the immune response. Of particular note is the 
expression of ANFP in reproductive and hematopoietic/immune, and gastrointestinal tissues. See United 

20 States Patent 5,989,863. 

It is believed that the protein of SEQ. ID. NO: 410 is a member of the family of human ankyrin 
proteins and as such plays a role in regulating the interaction between the cytoskeleton and membranous 
components. The identification of a new member of the ankyrin family and the polynucleotides encoding 
it addresses a need in the art by providing new compositions which are useful in the diagnosis, 

25 prevention, and treatment of autoimmune/inflammatory, cell proliferative, vesicle trafficking disorders 
and in modulating the response to infectious diseases. 

Preferred polypeptides of the invention are polypeptides comprsing the amino acids from 
positions 47 to 79. Other preferred polypeptides are fragments of SEQ.ID.NO: 410 having the desired 
biological activity. Further included in the invention are the polypeptides encoded by the human cDNA 

30 of clone 179-9-4-0-B8-CS. The polypeptides of SEQ ID NO: 410 may be interchanged with the 
corresponding polypeptides encoded by the human cDNA of clone 179-9-4-0-B8-CS. Further included in 
the invention are polynucleotides encoding said polypeptides. Preferred polynucleotides are those of SEQ 
ID NO: 169 and of the human cDNA of clone 179-9-4-0-B8-CS. 
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The invention also encompasses variants of the protein of the invention. A preferred variant is one 
which has at least about 80%, more preferably at least about 90%, and most preferably at least about 95% 
amino acid sequence identity to the amino acid sequence of SEQ.LD.NO: 410, and which contains at least 
one functional or structural characteristic of ankyrin. 
5 In a particular embodiment, the invention encompasses a polynucleotide sequence comprising the 

sequence of SEQ ID NO: 410, as well as variants of that sequence. Variants which encode at least one 
functional region characteristic of the ankyrin protein of the present invention are encompassed. Codon 
usage may be varied according to standard techniques in order to enhance expression in various hosts. 

In one embodiment, the protein of SEQ.ID.NO: 410 or a fragment or derivative thereof may be 

10 administered to a subject to treat or prevent a disorder associated with decreased expression or activity of 
ankyrin. Examples of such disorders include, but are not limited to, autoimmune/inflammatory disorders 
such as acquired immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress 
syndrome, allergies, ankylosing spondylitis, amyloidosis, anemia, asthma, atherosclerosis, autoimmune 
hemolytic anemia, autoimmune thyroiditis, autoimmune polyenodocrinopathy-candidiasis-ectodermal 

15 dystrophy (APECED), bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopic dermatitis, 
dermatomyositis, diabetes mellitus, emphysema, episodic lymphopenia with lymphocytotoxins, 
erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, Goodpasture's 
syndrome, gout, Graves' disease, Hashimoto's thyroiditis, hypereosinophilia, irritable bowel syndrome, 
multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, 

20 pancreatitis, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, Sjogren's 
syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, thrombocytopenic 
purpura, ulcerative colitis, uveitis, Werner syndrome, complications of cancer, hemodialysis, and 
extracorporeal circulation, viral, bacterial, fungal, parasitic, protozoal, and helminthic infections, and 
trauma; cell proliferative disorders such as actinic keratosis, arteriosclerosis, bursitis, cirrhosis, hepatitis, 

25 mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, 
polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, 
leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, cancers of the 
adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal 
tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, 

30 spleen, testis, thymus, thyroid, and uterus; and vesicle trafficking disorders such as cystic fibrosis, 
glucose-galactose malabsorption syndrome, hypercholesterolemia, diabetes mellitus, diabetes insipidus, 
hyper- and hypoglycemia, Grave's disease, goiter, and Cushing's disease, ulcerative colitis, and gastric 
and duodenal ulcers. 
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In another embodiment, a vector capable of expressing the protein of SEQ.ID.NO: 410 or a 
fragment or derivative thereof may be administered to a subject to treat or prevent a disorder associated 
with decreased expression or activity of ankyrin including, but not limited to, those described above. 

In a further embodiment, a pharmaceutical composition comprising a substantially purified 
5 protein of SEQ.ED. NO. 410 or a portion of the protein in conjunction with a suitable pharmaceutical 
carrier may be administered to a subject to treat or prevent a disorder associated with decreased 
expression or activity of the same or a similar protein including, but not limited to, those provided above. 

In still another embodiment, an agonist of the protein of SEQ. ID. NO. 410 which modulates the 
activity of the protein may be administered to a subject to treat or prevent a disorder associated with 
10 decreased expression or activity of the protein, or other ankyrin proteins, including, but not limited to, 
those listed above. 

In another embodiment, the polypeptide of SEQ. ID. NO. 410 may be used to produce antagonists 
using methods which are generally known in the art. In particular, purified polypeptide may be used to 
produce antibodies or to screen libraries of pharmaceutical agents to identify those which specifically 
15 bind ankyrin proteins. Neutralizing antibodies (i.e., those which inhibit dimer formation) can also be 
prepared for therapeutic use. 

In a further embodiment, an antagonist of the protein of SEQ. ID. NO. 410 may be administered 
to a subject to treat or prevent a disorder associated with increased expression or activity of the same 
protein or other members of the ankyrin family of proteins. Such disorders may include, but are not 
20 limited to, those discussed above. In one aspect, an antibody which specifically binds the claimed 
polypeptide may be used directly as an antagonist or indirectly as a targeting or delivery mechanism for 
bringing a pharmaceutical agent to cells or tissue which express the polypeptide. 

In an additional embodiment, a vector expressing the complement of the polynucleotide of SEQ. 
ID. NO. 169 may be administered to a subject to treat or prevent a disorder associated with increased 
25 expression or activity of ankyrin proteins including, but not limited to, those described above. 

In other embodiments, any of the proteins, antagonists, antibodies, agonists, complementary 
sequences, or vectors of the invention may be administered in combination with other appropriate 
therapeutic agents. The combination of therapeutic agents may act synergistically to effect the treatment 
or prevention of the various disorders described above. Using this approach, one may be able to achieve 
30 therapeutic efficacy with lower dosages of each agent, thus reducing the potential for adverse side effects. 

In another embodiment of the invention, the polynucleotides encoding the polypeptide of SEQ. 
ID. NO. 410, or any fragment or complement thereof, may be used for therapeutic purposes. In one 
aspect, the complement of the polynucleotide encoding the above-identified polypeptide may be used in 
situations in which it would be desirable to block the transcription of the mRNA. In particular, cells may 
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be transformed with sequences complementary to polynucleotides encoding the polypeptide. Thus, 
complementary molecules or fragments may be used to modulate the activity of the claimed polypeptide 
or related ankyrin proteins, or to achieve regulation of gene function. 

In another embodiment of the invention, the nucleotide sequence encoding the polypeptide of 
5 SEQ. ID. NO. 410 can be used to turn off the genes expressing the polynucleotide or related ankyrin 
proteins. In particular, a cell or tissue can be transformed with expression vectors which express high 
levels of the polynucleotide, or fragment thereof. Such constructs may be used to introduce untranslatable 
sense or antisense sequences into the cell. Expression vectors derived from retroviruses, adenoviruses, or 
herpes or vaccinia viruses, or from various bacterial plasmids, may be used for delivery of the nucleotide 

10 sequences to a targeted organ, tissue, or cell population. 

An additional embodiment of the invention relates to the administration of a pharmaceutical or 
sterile composition, in conjunction with a pharmaceutical^ acceptable carrier, for any of the therapeutic 
effects discussed above. The composition may be delivered via a variety of different routes. 

In another embodiment, antibodies which bind the polypeptide of SEQ. ID. NO. 410 may be used 

15 for the diagnosis of disorders characterized by expression of ANFP, or in assays to monitor patients being 
treated with the polypeptide, other ankyrin proteins or agonists, antagonists, or inhibitors of the same. A 
variety of assay types, including ELISAs, RIAs, and FACS, can be used. 

In another embodiment of the invention, the polynucleotide of SEQ. ID. NO. 169 itself, may be 
used for diagnostic purposes. The polynucleotide can be used to generate oligonucleotide sequences, 

20 complementary RNA and DNA molecules, and PNAs which are useful in diagnosis. The polynucleotide 
and related molecules may be used to detect and quantitate gene expression in biopsied tissues in which 
expression of the polypeptide of SEQ. ID. NO. 410 or other ankyrin polypeptides may be correlated with 
disease. The diagnostic assay may be used to determine absence, presence, and excess expression of the 
polypeptides, and to monitor regulation of polypeptide levels during therapeutic intervention. Examples 

25 of diagnostic methods include Southern or Northern analysis, dot blot, or other membrane-based 
technologies; in PCR technologies; in dipstick, pin, and multiformat ELISA-like assays; and in 
microarrays utilizing fluids or tissues from patients to detect altered ANFP expression. Such qualitative or 
quantitative methods are well known in the art. Such assays may also be used to evaluate the efficacy of a 
particular therapeutic treatment regimen in animal studies, in clinical trials, or to monitor the treatment of 

30 an individual patient. 

In further embodiments, oligonucleotides or longer fragments derived from any of the polynucleotide 
sequences described herein may be used as targets in a microarray. The microarray can be used to monitor 
the expression level of large numbers of genes simultaneously and to identify genetic variants, mutations, 
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and polymorphisms. This information may be used to determine gene function, to understand the genetic 
basis of a disorder, to diagnose a disorder, and to develop and monitor the activities of therapeutic agents. 

In another aspect of the invention, the polypeptide may be used to stimulate the expression of 
genes that have a role in organ and organ system development. Thus, in a preferred embodiment, the 
5 protein of the invention, a fragment, or derivative thereof, may be administered to a subject to treat or 
prevent developmental disorders. 

In a further embodiment, the protein of the invention may be administered to a subject to treat or 
prevent a cardiovascular disorder. Such disorders can include, but are not limited to, arteriosclerosis 
including atherosclerosis and nonatheromatus arteriosclerosis, hypertension, stroke, coronary artery 
10 disease, ischemia, myocardial infarction, angina pectoris, cardiac arrhythmias, sinoatrial node blocks, 
atrioventricular node blocks, chronic hemodynamic overload, aneurysm, Jervell and Lange-Nielsen 
syndrome, and long QT syndrome. The protein of the invention may also be used as a marquer of cardiac 
hypertrophy so it may be included in diagnosis kit for this disease. 

In another embodiment of the invention, the polypeptide and/or polynucleotide may be used to 
15 inhibit cellular proliferation and to treat and/or diagnose disorders associated with cellular proliferation 
including but not limited to cancer. 

In a further embodiment of the invention, an antagonist of the protein of the invention may be 
administered to a subject to treat or prevent a cancer. In one aspect, an antibody which specifically binds 
the protein of the invention may be used directly as an antagonist or indirectly as a targeting or delivery 
20 mechanism for bringing a pharmaceutical agent to cells or tissue which express the protein of the 
invention. 

In yet another embodiment, an antagonist of the protein of the invention may be administered to a 
subject to treat or prevent a neuronal disorder. Such a disorder may include, but is not limited to, 
akathesia, Alzheimer's disease, amnesia, amyotrophic lateral sclerosis, bipolar disorder, catatonia, 

25 cerebral neoplasms, dementia, depression, diabetic neuropathy, Down's syndrome, tardive dyskinesia, 
dystonias, epilepsy, Huntington's disease, peripheral neuropathy, multiple sclerosis, neurofibromatosis, 
Parkinson's disease, paranoid psychoses, postherpetic neuralgia, schizophrenia, and Tourette's disorder. In 
one aspect, an antibody which specifically binds the protein of the invention may be used directly as an 
antagonist or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells 

30 or tissue which express the protein of the invention. 

In another embodiment, the protein of the invention can be administered to a subject to treat or 
prevent a malaria. The protein of the invention may be used also in diagnosis of malaria. 
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In yet another embodiment, the polynucleotide and/or the polypeptide of the present invention can 
be used as a therapeutic composition capable of protecting an animal from a disease caused by a parasitic 
helminth. The polypeptide can be used a target for antiparasitic vaccines and drugs. 

Ankyrin has been shown to underlie membrane proteins including CD44, the voltage-dependent 
5 sodium channel, NA+/K+ ATPase and the anion exchanger protein. It is believed that the formation of a 
direct connection between ankyrin and functionally important transmembrane proteins/membrane 
skeleton may be one of the earliest events to occur during signal transduction and cell activation. Thus, in 
a further embodiment, the polypeptide of the present invention can be used to disrupt the connection 
between ankyrin and the membrane thus affecting fundamental processes within the cell. 
10 The polypeptide of the present invention can be further used to screen for compounds that inhibit 

or enhance the binding of ankyrin binding proteins and to affect the association between ankyrin and 
proteins, such as Alpha-Na,K-ATPase, which are critical to intracellular transport of ions and nutrients. 

o 

In yet another embodiment the regulatory domain of the polypeptide of SEQ. ID.NO. 410 or 
antagonists thereof can be used to enhance or disrupt the protein binding activities of the other domains. 

15 Proteins of SEQ ID NOs: 385 and 416 (internal designations 105-021 -3 -0-C3-CS and 188-31-1-0-E6-CS 
respectively) 

The 354 amino acids protein of SEQ ID NO: 385 encoded by the cDNA of SEQ ID NO: 144 
found in brain displays 6 kelch motifs (pfam accession number PF01344 ) at positions 20-66, 68-114, 
116-162, 164-209, 211-265 and 270-316. Morevoer, 4 residues conserved in over 90% of kelch family 

20 sequences are found in the protein of invention: di-glycine at positions 133-134, tyrosine 148 and 
tryptophan 155. In addition, six residues separate the tyrosine 148 and the tryptophan 155: this feature is 
conserved in over 70% of kelch proteins (Adams et al., trends in cell biology, 10:17-24, 2000). The 
proteins of the invention encoded by the cDNA of SEQ ID NO: 144 is a polymorphic variant of the 
protein of SEQ ID NO: 416 encoded by the cDNA of SEQ ID NO: 175, thought to have the same 

25 functions and utilities. 

Drosophila kelch is located in the ring canals which are actin-rich bridges. Kelch localizes to the 
rim of preformed canals and serves to maintain actin organization (Xue et al., Cell (72)681-693, 1993; 
Robinson et al., J. Cell Biol. (138)799-810, 1997). In mammalian sperm, calicin is located within an 
actin-negative structure termed the calyx, which is involved in the morphogenesis of the spermatocyte 

30 (von Bulow et al., Exp. Cell Res. (219)407-413, 1995). Calcin and a well-structured calyx are both 
lacking in certain teratozoospermias, possibly indicating a central role for calicin in the organization of 
this structre (Courtot et al., Mol. Reprod. Dev. (28)272-279, 1991). In Schizosaccharomyces pombe, 
Ral2p acts down-stream of Raslp in pathways that affect cell morphology, conjugation and sporulation. 
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The spherical morphology and mating defects of ral2-null cells are complemented by overxpression of 
Raslp, indicating a close functional interaction between the two proteins (Fukui et al., Mol. Cell. Biol. 
(9)5617-5622, 1989). The transcription factor Nrf2 is sequestered by the kelch-repeat containing Keapl 
protein under normal cellular conditions. The stimulation by agents such as diethylmaleate induces the 
5 translocation of Nrf2 to the nucleus to initiate the cytoprotective electrophilic counterattack response (Itoh 
et al. 5 Genes Dev. (13)76-86, 1999). Lytic infection of cells by herpes simplex virus is initiated by 
binding of virally encoded VP 16 to HCF-1, a protein thought to have a normal role in cell-cycle 
progression. The HCF-VP16 complex then assembles with Oct-1 transcription factor on cis-regulatory 
targets in the HSV genome to initiate virus replication. (Wilson et al., Mol. Cell. Biol (17)6139-6146, 

10 1997; Hughes et al., J. Biol. Chem. (274)16437-16443, 1999). Two recently discovered mammalian 
kelch-repeat proteins have extracellular roles. Human attractin appears to participate in normal immune 
defence as a serum glycoprotein released by activated T cells. In coculture assays, attractin stimulates 
adhesion and spreading of monocytes, facilitating the development of T-cell clusters and cellular immune 
responses (Duke-Cohan et al., Proc. Natl. Acad. Sci. U. S. A. (95)11336-11341, 1998). Attractin is 

15 orthologous to the extracellular domain of mouse mahogany, a large, multidomain, transmembrane 
protein that has been implicated in the homeostasis of energy metabolism by its suppressive effects on 
certain types of obesity in mice (Gunn et al., Nature (398)152-156, 1999; Nagle et al., Nature (398)148- 
152, 1999). 

Evidence for the importance of kelch repeat beta-propellers in protein function has also come 
20 from studies of natural an engineered loss-of-function mutations. Caenorhabditis elegans Spe-26 mutant 
spermatocytes fail to complete meiosis, contain multiple nuclei and show gross disorganization of actin 
filaments and organelles. For five out of the six alleles that have been examined in detail, the mutations 
map within the kelch repeats (Varkey et al., Genes Dev. (9)1074-1086, 1995). Of particular interest are 
the point mutations in RAG-2 that have been identified in some cases of human B-cell-negative severe 
25 combined immuno-deficiency or of Omenn syndrome (Schwarz et al., Science (274)97-99, 1996; Villa et 
al., Cell (93)885-896, 1998). Very recently, the gigaxonin, a new member of the cytoskeletal BTB/kelch 
repeat family, is described as mutated in giant axonal neuropathy (Bomont et al., Nat. Genet. (26) 370- 
374, 2000). 

It is believed that the proteins of the invention are members of the kelch superfamily and, such 
30 as, play a role in the association with the actin cytoskeleton, the organization of cytoskeletal, plasma 
membrane or organelle structures, the coordination of morphology and growth, the gene expression, the 
viral pathogenesis, the immune defence. In particular, the proteins of invention are highly expressed in 
brain and is believed to be related to the CNS disorders. Preferred polypeptides of the invention are 
polypetides comprising the amino acids of the proteins of invention at positions 20-66, 68-1 14, 1 16-162, 
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1 64-209, 2 1 1 -265 and 270-3 1 6. In one embodiment, the proteins of the invention or part thereof are used 
to modulate actin organization in cells thus affecting the cytoskeleton and cell function in general. 

The invention also features compounds, e.g., proteins, which interact with the protein of the 
invention. Any method suitable for detecting protein-protein interactions may be employed for identifying 
5 transmembrane proteins, intracellular, or extracellular proteins that interact with the protein. Among the 
traditional methods which may be employed are co-immunoprecipitation, crosslinking and co-purification 
through gradients or chromatographic columns of cell lysates, or proteins obtained from cell lysates, and 
the use of the proteins of the invention to identify proteins in the lysate that interact with it. For these 
assays, the protein of the invention can be full length or some other suitable protein polypeptide fragment. 

10 Once isolated, such an interacting protein can be identified and cloned and then used, in conjunction with 
standard techniques, to identify proteins with which it interacts. For example, at least a portion of the 
amino acid sequence of a protein which interacts with the protein of the invention can be ascertained 
using techniques well known to those of skill in the art, such as via the Edman degradation technique. The 
amino acid sequence obtained may be used as a guide for the generation of oligonucleotide mixtures that 

15 can be used to screen for gene sequences encoding the interacting protein. Screening may be 
accomplished, for example, by standard hybridization or PCR techniques. Techniques for the generation 
of oligonucleotide mixtures and the screening are well-known. ("PCR Protocols: A Guide to Methods and 
Applications," Innis et al., eds. Academic Press, Inc., NY, 1990). 

Additionally, methods can be employed which result directly in the identification of genes which 

20 encode proteins that interact with the protein of the invention. These methods include, for example, 
screening expression libraries, in a manner similar to the well known technique of antibody probing of 
lambda.gtl 1 libraries, using labeled polypeptide or a protein fusion protein, e.g., a protein polypeptide or 
domain fused to a marker such as an enzyme, fluorescent dye, a luminescent protein, or to an IgFc 
domain. 

25 Another embodiment of the invention relates to compositions and methods using the protein of 

the invention or part thereof to modulate actin organization and related cytoskeletal protein organization 
in cells and in particular, cells of the CNS. Compositions containing the protein of the invention and 
fragments thereof may be therapeutic and used to treat a variety of neuronal disorders. An additional 
embodiment of the invention relates to the administration of a pharmaceutical or sterile composition, in 

30 conjunction with a pharmaceutically acceptable carrier, for any of the therapeutic effects discussed. The 
composition may be delivered via a variety of different routes. 

In yet another embodiment, an antagonist of the protein of the invention may be administered to a 
subject to treat or prevent a neuronal disorder. Such a disorder may include, but is not limited to, 
akathesia, Alzheimer's disease, amnesia, amyotrophic lateral sclerosis, bipolar disorder, catatonia, 
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cerebral neoplasms, dementia, depression, diabetic neuropathy, Down's syndrome, tardive dyskinesia, 
dystonias, epilepsy, Huntington's disease, peripheral neuropathy, multiple sclerosis, neurofibromatosis, 
Parkinson's disease, paranoid psychoses, postherpetic neuralgia, schizophrenia, and Tourette's disorder. In 
one aspect, an antibody which specifically binds the protein of the invention may be used directly as an 
5 antagonist or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells 
or tissue which express the protein of the invention. 

Another embodiment of the invention encompasses DNA sequences which encode the proteins of 
the invention that may be derived through biological or synthetic chemistry processes. Polynucleotides 
sequences capable of hybridizing to the cDNA sequences of SEQ ID NOs: 144 and 175are also included 

10 in the scope of the invention. 

Further included in the invention are the polypeptides encoded by the human cDNA of clones 
105-021-3-0-C3-CS and 188-31-1-0-E6-CS. The polypeptides of SEQ ED NOs: 385 and 416 may be 
interchanged with the corresponding polypeptides encoded by the human cDNA of clones 105-021-3-0- 
C3-CS and 188-31-1-0-E6-CS. Further included in the invention are polynucleotides encoding said 

15 polypeptides. Preferred polynucleotides are those of SEQ ID NOs: 144 and 175 and of the human cDNA 
of clones 105-021-3-0-C3-CS and 188-31-1-0-E6-CS. 

Another embodiment of the invention to methods of using the nucleotidic sequence or part 
thereof of invention to search homologs to the protein of invention. The sequence can be used as template 
of PCR reactions, allowing the detection/quantification of the protein of invention or part of thereof or the 

20 homologs. The complementary sequence or part of thereof may be used as hybridization probes to 
detect/quantify the transcription level, as well in in vitro level as the cellular level. In particular, such 
probes may be used in a diagnostic context, for example in the cellular or tissue in situ hybridization. 

Another embodiment of the invention to methods of using the nucleotidic sequence or part 
thereof of invention to design antisense oligonucleotides to modulate the in vitro or in vivo expression of 

25 the protein or the part or thereof gene expression. This may be useful in therapeutic area of diseases listed 
above, particularly in the context where the protein of invention is expressed in abnormally high level. 

In a further embodiment of the invention, the protein of invention or portions thereof are used to 
produce specific antibodies. These antibodies may have applications in the diagnostics, the purification of 
the protein of invention or part of thereof or a homolog. They may also help to vizualize the locations of 

30 the proteins associated to the protein of invention in the cell, in particular for highlighting structure- 
related proteins. The methods described herein in which protein antibodies are employed may be 
performed, for example, by utilizing pre-packaged diagnostic kits comprising at least one specific cDNA 
of SEQ ID NO. or antibody reagent described herein, which may be conveniently used, for example, in 
clinical settings, to diagnose patients exhibiting symptoms of various CNS disorders. 
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In still another preferred embodiment, the present inventions relates to methods of using the 
protein of the invention or part thereof in gene therapy, particularly in the diseases involving the CNS, 
particularly in the context where the protein of the invention is expressed in abnormally low level. Gene 
therapy is a potential therapeutic approach in which normal copies of the cDNAs of SEQ ID NOs: 144 
5 and 175 may be introduced into subjects to successfully code for normal protein in several different 
affected cell types. 

Another aspect of the present invention includes a formulation comprising the protein of the 
invention or part thereof and a pharmaceutically or physiologically acceptable carrier. A formulation of 
the present invention comprises a combination of one or more peptides as described herein, or mimetopes 
10 thereof; a combination of antibodies as described herein, or mimetopes thereof; or a combination of 
antibodies and peptides as described herein, or mimetopes thereof. Such a formulation may be 
administered to a subject in need thereof to treat or prevent a disorder associated with decreased 
expression or activity of the protein. Examples of such disorders include but are not limited to those of 
the CNS and other tissues where the association of actin appears to be abnormal. 
* 

15 Proteins of SEQ ID NO: 391, 393, 405 and 407 (internal designations 145-52-2-0-D12-CS, 145-7-3-0- 
D3-CS, 174-17-1-0-D6-CS and 174-38-4-0-D1 1-CS respectively) 

The cluster of four proteins (SEQ ID NOs: 391, 393, 405 and 407) encoded by the cDNAs of SEQ 
ID NOs: 150, 152, 164 and 166 respectively exhibit very strong homology to claudin-8, a member of 
PMP22-Claudin family (PF00822). SEQ ID NO: 405 (174-17-1-0-D6-CS) shares a high degree of identity 

20 with human claudin-8. SEQ ID NO: 393 contains two amino acid substitutions as compared to human 
claudin-8 (T129A and S151P); thus, it appears to be a polymorphic variant of claudin-8. SEQ ID Nos: 393 
and 405 contain four membrane spanning segments. 

SEQ ID NOs: 391 and 407 are polymorphic forms of claudin-8. The protein of SEQ ID NO: 391 
(145-52-2 -0-D12-CS) is 162 amino-acids long, contains three theoretical membrane spanning segments, and 

25 shows three amino acid substitutions as compared to the previously identified claudin-8 protein (R31I, S151P 
and E162). The protein of SEQ ID NO: 407 ( 1 74-3 8-4-0-D1 1-CS) is 43 amino-acids long. SEQ ID NO: 407 
contains a stop codon at position 44 and contains no apparent membrane spanning segments. 

The Claudin family of proteins comprises more than twenty (20) small glycoproteins with four 
predicted transmembrane domains. The tissue distribution pattern of claudins varies significantly, depending 

30 on claudin species. Many have been identified as components of tight junction (TJ) strands which contribute 
in regulation of cell polarity and permeability. Polarized epithelial and endothelial cells form barriers that 
separate biological compartments and regulate homeostasis. The tight junction (TJ) is a specialized 
membrane domain at the most apical region of polarized epithelial and endothelial cells that not only creates 
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a primary barrier to prevent paracellular transport of solutes (barrier function) but also restricts the lateral 
diffusion of membrane lipids and proteins to maintain the cellular polarity (fence function). Tight junctions 
appear to represent a continuous network of interconnected rows of intramembranous particles that appear as 
strands with complementary grooves. The TJ-specific integral membrane proteins, i.e. the components of TJ 
5 strands, occludins and claudins, were only recently identified. 

Claudin-1 and -2 have the ability to induce the formation of networks of strands/grooves at cell-cell 
contact sites when introduced into fibroblasts lacking TJs. Occludin induces only a small number of short 
strands at cell-cell contact sites in fibroblasts, thus, it is an accessory protein in terms of TJ strand formation. 
Claudin transfection experiments in fibroblasts revealed the TJ strand itself can be formed without occludin 

10 (Saitou M et al. J. Cell Biol. 141: 397-408 (1998), Furuse M et al. J. Cell Biol. 143: 391-401 (1998)). 

Initially several members of the claudin family were reported (RVP1, Clostridium perfringens 
enterotoxin receptor (CPE-R ), and TMVCF (transmembrane protein deleted in Velo-cardio-facial 
syndrome)), but their physiological functions were not determined. After the identification of claudin-1 and - 
2 as novel components of TJ strands (Furuse, M. et al. J. Cell Biol. 141, 1539-1550 (1998)), CPE-R was 

15 shown to remove specific claudins from TJs. In its presence, TJ strands in C3L cells gradually disintegrate 
and the number of TJ strands and the complexity of their network decreases markedly (Sonoda N et al. J Cell 
Biol 147(1): 195-204 (1999)). In distal tubules of the kidney, claudin-4 (CPE-R) and claudin-8 were co- 
localized with occludin at their junctional complex region. In liver, claudin-3 and occludin were co-localized 
along bile canaliculi and TJ strands were labeled heavily and specifically with anti-claudin-3 Ab (Morita et al. 

20 PNAS 96 (2): 5 1 1 -5 1 6 (1 999)). The claudins have been shown to create the paracellular diffusion barrier and, 
surprisingly, they may also confer channel-like selectivity for passage of solutes through the tissue barrier 
(Anderson JM and Christina M. Van Itallie CM, Current Biology 9:R922-R924 (1999)). 

The existence of the claudin multigene family as well as the tissue distribution pattern of each 
claudin species suggests that similar complexity can be expected in TJs and contributes to the generation of 

25 functional diversity of TJs in vivo. More than two distinct claudins are co-expressed in single epithelial cell. 
Claudins interact between each of the paired strands in a heterophilic manner and distinct claudins are (except 
in some combinations) co-incorporated into individual TJ strands (Furuse M et al. J Cell Biol 147(4):891-903 
(1999)). 

Several claudins have been shown to be expression markers of malignant cells. For example, 
30 SEMP1 (senescence-associated epithelial membrane protein) is expressed in normal human tissues, including 
adult and fetal liver, pancreas, placenta, adrenals, prostate and ovary; however, SEMP1 is expressed at low or 
undetectable levels in a number of human breast cancer cell lines (Swisshelm K et al. Gene 226:285-295 
(1999)). Another member of the claudin family was found to be exclusively expressed in MCF-7ADR 
mammary carcinoma cells. MCF-7ADR carcinoma cells are estradiol-independent for growth, estrogen- 
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receptor negative, tamoxifen resistant, vimentin positive and invasive in vitro and in vivo (Schiemann S et al., 
Anticancer Res 17(lA):13-20 (1997)). Further, down regulation of the expression of claudin-1 has been 
associated with oncogenesis in rat salivary gland epithelium cells (Li D and Mrsny RJ J Cell Biol 148(4):791- 
800(2000)). 

5 The increase in microvascular permeability in human gliomas, contributing to clinically severe 

symptoms of brain edema, appears to be the result of a dysregulation of junctional proteins. Increased TJ 
permeability of the colon epithelium, and consequently a decrease in epithelial barrier function, precedes the 
development of colon tumors, including carcinomas and adenomatous polyps (Soler AP et al. Carcinogenesis 
20(8): 1425-1431 (1999)). Studies of the interendothelial junctions in tumor microvessels of human 

10 glioblastoma multiforme show that the expression of claudin-1 is lost in the majority of tumor microvessels, 
whereas claudin-5 is significantly down-regulated only in hyperplastic vessels. A relationship between 
claudin-1 suppression and the alteration of tight junction morphology is likely to correlate with the increase 
of endothelial permeability (Liebner S et al. Acta Neuropathol (Berl) 100(3):323-331 (2000)). 

The human phenotype of mutations in claudin-1 6 suggests that it creates a channel allowing 

1 5 magnesium to diffuse through renal tight junctions. Similarly, a mouse knockout of claudin-1 1 reveals its role 
in formation of tight junctions in myelin and between Sertoli cells in testis (Mitic LL et al. Am J Physiol 
Gastrointest Liver Physiol 279(2):G250-254 (2000)). 

Opening of TJs by environmental proteinases may be the initial step in the development of asthma to 
a variety of allergens. The lung epithelium forms a barrier that allergens must cross before they can cause 

20 sensitization. The cysteine proteinase allergen Der p 1 from fecal pellets of Dermatophagoides pteronyssinus 
(the house dust mite (HDM)) causes disruption of intercellular tight junctions (TJs), which are the principal 
components of the epithelial paracellular permeability barrier. TJ breakdown nonspecifically increases 
epithelial permeability, allowing Der p 1 to cross the epithelial barrier. Putative Der p 1 cleavage sites were 
found in peptides from an extracellular domain of claudin-1 . House dust mite (HDM) allergens are important 

25 factors in the increasing prevalence of asthma (Wan H et al. J Clin Invest 104(l):123-33 (1999)). 

In many intestinal and systemic diseases, intestinal barrier damage is marked by changes in intestinal 
permeability which are, in turn, related to alteration in tight junction function (Gasbarrini G, Montalto MItal J 
Gastroenterol Hepatol 31 (6):48 1-488 (1999)). Permeability of the tight junctions can be modified by bacterial 
toxins, cytokines, hormones and drugs. Oligodendrocyte-specific protein (OSP/claudin-11), found in CNS 

30 myelin, appears to be a promising candidate for auto-antigenic involvement in autoimmune demyelinating 
disease. The presence of anti-OSP Abs in the cerebrospinal fluid was reported for relapsing-remitting 
multiple sclerosis (MS). Murine OSP peptides elicit clinical experimental autoimmune encephalomyelitis in 
animal models for MS and induces mononuclear cell infiltrates and focal demyelination. Also OSP peptides 
elicit robust proliferative responses in T cells (Stevens DB et al. J Immunol 162:7501-7509 (1999)). 
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OSP/claudin-1 1 appears to modulate proliferation and migration of oligodendrocytes, presumably through the 
membrane interactions at tight junctions and with the extracellular matrix (Bronstein JM et al. J Neurosci Res 
59(6):706-711 (2000)). Recently claudin-11 has been shown to play a key role in the formation of 
hematotesticular barrier; it is regulated by FS hormone and by cytokines in early fetal and postnatal 
5 development in Sertoli cells (Hellani A. et al. Endocrinology 141 : 3012-3019 (2000)). 

SEQ ID NOs: 391, 393, 405 and 407 are new human proteins having biological activities described 
for claudins. Nucleic acids encoding the proteins of interest are over represented in fetal kidney and in 
salivary gland. The subject invention provides polynucleotides encoding the proteins of SEQ ID Nos: 391, 
393, 405 and 407. In one embodiment, the polypeptides of SEQ ID NOs: 391, 393, 405 and 407 are 

10 interchanged by the polypeptides encoded by clones 145-52-2-0-D12-CS, 145-7-3-0-D3-CS, 174-1 7-1 -0-D6- 
CS and 174-38-4-0-D11-CS. Also provided are use of these proteins, fragments, derivatives thereof (and 
related polynucleotides) for the diagnosis, treatment, or prevention of tumors and another diseases, including 
disorders associated with altered epithelial function. The invention also encompasses possible variants of the 
proteins of interest which have at least about 80%, more preferably at least about 90%, and most preferably at 

15 least about 95% amino acid sequence identity to the amino acid sequence, provide the variants have at least 
one of the functional or structural characteristics of the identified claudin-like proteins. 

In one embodiment of the subject invention, the proteins of interest, or biologically active fragments 
or variants thereof, may be administered to a subject to treat or prevent disorders of salivary gland, kidney 
and prostate. The subject invention also provides therapeutic regimens for the treatment of epithelial 

20 dysfunction and cancer. 

The disorders which may be treated in accordance with the subject invention include, but are not 
limited to, asthma, eczema, atopic dermatitis, contact dermatitis, stasis dermatitis, seborrheic dermatitis, 
psoriasis, lichen planus, pityriasis rosea, acne vulgaris, acne rosacea, pemphigus vulgaris, pemphigus 
foliaceus, paraneoplastic pemphigus, bullous pemphigoid, herpes gestationis, dermatitis herpetiformis, linear 

25 IgA disease, epidermolysis bullosa acquisita, dermatomyositis, lupus erythematosus, scleroderma, and 
morphea; gastritis, peptic ulcers, cholelithiasis, cholecystitis, pancreatitis, cirrhosis, ulcerative colitis, Crohn's 
disease, and irritable bowel syndrome; Addison's disease, Lowe's syndrome, glomerulonephritis, chronic 
glomerulonephritis, tubulointerstitial nephritis, inherited X-linked nephrogenic diabetes insipidus, autosomal 
dominant polycystic kidney disease, autoimmune demyelinating disease, multiple sclerosis, glioma, and other 

30 tumors. 

A further aspect of the invention provides a method for treating these and/or other pathological states 
by administering, to a patient, a therapeutically effective amount of one or more of the proteins of interest. 
The proteins of interest may, optionally, be simultaneously or sequentially administered in conjunction with 
cytokines and/or interleukins which have been shown to improve claudin expression. 
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In another embodiment, a vector capable of driving expression of one or more of the proteins of 
interest, or a biologically active fragment or variant thereof, may be administered to a subject to treat or 
prevent an epithelial permeability disorder including, but not limited to aforementioned disorders. 

Another embodiment of the subject invention provides compositions and methods of treating, or 
5 reducing the incidence of, asthma comprising the administration of therapeutically effective amounts of the 
proteins of the subject invention. In one embodiment, purified fragments, or synthetically modified peptides, 
derived from the extracellular domains of the proteins of interest may be administered in the therapeutic 
regimen. The peptides, containing the putative cleavage sites for environmental allergen proteinases, may be 
administrated in amounts which competitively inhibit the proteinase activity of the allergen. The peptides 
10 may be designed to bind allergen, optionally in an irreversible manner, and inhibit proteinase activity. 

The negative effects of the usual preservation solutions on epithelial and endothelial permeability in 
organs to be transplanted are generally known (Trocha S.D. et al. Ann.Surg. 230: 105-113 (1999)). Increases 
in permeability leads to tissue injury and edema. Disorganization of tight junctional proteins appears to be 
responsible for the observed tissue injury and edema. Thus, in another embodiment, purified proteins of 
15 interest, or variants and/or biologically active fragments thereof, may be added in organ preservation 
solutions to maintain the content and integrity of tight junctions in organs. 

In another embodiment, the subject invention provides methods of producing "bioartificial" epithelia 
from non-epithelial cells. The "bioartificial" epithelia produced according to the invention may be used for 
reconstructive surgical procedures, for treating of disorders related to epithelial loss (for hereditary, traumatic 
20 or oncological reasons) or for another therapeutic purposes (e.g., burn treatments). "Bioartificial" epithelial 
cells can be obtained by transfection and remodeling of the autologous patient cells not affected by any of the 
aforementioned disorders. The use of autologous cells in the preparation of the "bioartificial" epithelial cells 
of the invention in methods of treating disorders, conditions, or diseases associated with the loss of epithelial 
cells reduces or eliminates the risk of tissue rejection typically observed in transplantation methodologies. 
25 Methods of bioartificial tissue engineering are generally known to those skilled in the art (for a review, see 
Machens H.G, et al. Cells Tissues Organs 167: 88-94 (2000)). 

In another embodiment of the subject invention provides antibodies which specifically bind to the 
proteins of SEQ ID Nos: 391, 393, 405 and 407. The antibodies may also specifically bind to fragments or 
variants of the proteins described in SEQ ID Nos: 391, 393, 405 and 407. The antibodies of the invention 
30 may be used to detect the protein of interest in human body fluids, extracts of cells or tissue extracts. The 
detection assays may be used for epithelial cancer prognosis and for the diagnosis of disorders. The assays 
may also be used to monitor patients being treated with the proteins of interest. 

In another embodiment, the polynucleotide sequences, or fragments of said polynucleotide 
sequences, encoding the proteins of interest may be used for the identification or diagnosis of a disorder 
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associated with expression of the proteins of SEQ ID Nos: 391, 393, 405 and 407. Hybridization assays 
which allow for the detection of polynucleotide sequences of the invention are well known to the skilled 
artisan. These assays include, and are not limited to, Northern blots, Southern blots, and PCR methodologies. 
Another embodiment of the invention provides the proteins of SEQ ID Nos: 391, 393, 405 and 407, 
5 variants, immunogenic fragments, or biologically active fragments of said proteins for screening libraries of 
compounds in any of a variety of drug screening techniques. The proteins of SEQ ID Nos: 391, 393, 405 and 
407, variants, immunogenic fragments, or biologically active fragments of said proteins employed in such 
screening may be free in solution, affixed to a solid support, recombinantly expressed on, or chemically 
attached to, a cell surface, or located intracellularly. The formation of binding complexes between the protein 

10 of interest and the agent being tested may be measured by methods well known to those skilled in the art. 

Yet another embodiment of the invention provides methods of screening compounds which modulate 
epithelial permeability Polynucleotides encoding the proteins of SEQ ID Nos: 391, 393, 405 and 407, 
variants, immunogenic fragments, or biologically active fragments of said proteins, may be recombinantly 
expressed in cells typically lacking TJs according to methods discussed supra. These cells may then be used 

15 to assess therapeutic modulators (based, for example, on CPE -like compounds) for the ability to increase or 
decrease epithelial cell permeability. Compounds identified in these modulator screen assays may then be 
used in therapeutic protocols to adjust epithelial cell permeability as desired by the practitioner. 

The intestinal epithelium is a major barrier to the absoiption of hydrophilic drugs. The presence of 
intercellular junctional complexes, particularly the tight junctions, renders the epithelium impervious to 

20 hydrophilic drugs, which cannot diffuse across the cells through the lipid bilayer of the cell membranes 
(Ward PD et al. Pharmaceutical Science and Technology Today 3:10:346-358 (2000)). Therefore, in another 
embodiment of the subject invention the proteins of SEQ ID Nos: A, B, C, or D, variants, or biologically 
active fragments of said proteins and their molecular partners can be used for the rational design of 
compounds that can effectively and safely increase paracellular permeability for selected drugs. For example, 

25 polynucleotides encoding the proteins of interest or any fragment or derivatives thereof, may be used for 
these purposes. In one aspect, the complement of the polynucleotide encoding the protein of interest may be 
used in situations in which it would be desirable to block the transcription of the mRNA encoding the 
proteins of interest, especially for temporally increasing epithelial permeability (useful for drug delivery). 
Alternatively, sense or antisense oligonucleotides may be designed from various locations along the coding or 

30 control regions of polynucleotide sequences encoding the proteins of SEQ ID Nos: 391, 393, 405 and 407, as 
well as variants, or biologically active fragments of said proteins to control expression of the proteins. 
Methods of producing and using sense and antisense oligonucleotides are well known to those skilled in the 
art. 
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Claudins are unique proteins with specific protein-binding properties. Therefore, in another preferred 
embodiment, the proteins of SEQ ID Nos: 391, 393, 405 and 407, variants, or biologically active fragments 
of said proteins may be used as a component of drug delivery vehicles such as colloids or liposomes. The 
proteins of the proteins of SEQ ED Nos: 391, 393, 405 and 407, variants, or biologically active fragments of 
5 said proteins may be incorporated into the lipid membranes of liposomes and can serve as specific targeting 
agents which bind the specific epithelial targets and facilitate targeted epithelium drug delivery. The methods 
of design of such type of drug delivery systems is known by those skilled in the art (Smith H.J. Introduction 
to the principles of drug design and action, 3 rd ed. (1998)). Alternatively, active agents, such as 
chemotherapeutic agents, radioisotopes, prodrugs, may be directly attached, recombinantly or chemically, to 
10 the proteins of SEQ ID NOs: 391, 393, 405 and 407, variants, or biologically active fragments of said 
proteins and used in therapeutic regimens. 

Proteins of SEQ ID Nos: 278, 282 and 300 (internal designations 160-37-2-0-H7-CS, 174-33-3-0-F6-CS, 
184-4-1-0-A1 1-CS respectively) 

The protein of SEQ ID No: 278 (and the corresponding allelic variants 282 and 300) encoded by the 

15 cDNA SEQ ID No: 37 (41, 59 respectively) shows homology to a human transmembrane protein (HTMN- 
23, Genseq accession number Y57899). The protein of SEQ ID No: 278 (and the corresponding polymorphic 
variants 282 and 300) overexpressed in salivary gland contains 9 potential transmembrane segments from 
positions 85 to 105, 116 to 136, 164 to 184, 187 to 207, 332 to 352, 376 to 396, 404 to 424, 465 to 485 and 
499 to 519, thus displaying chracteristic features of type III transmembrane proteins (Singer, Annu. Rev. Cell 

20 Biol, 6:247-296, 1990). Furthermore, a predicted localisation in the endoplasmic reticulum (ER) is found for 
the protein of the invention with the software psort. 

The normal fonctioning of the eukaryotic cell requires that all the newly synthesized proteins be 
correctly folded, modified, and delivered to specific intra- and extracellular sites. Newly synthesized 
membrane and secretory proteins enter a cellular sorting and distribution network during or immedialety after 

25 synthesis and are routed to specific locations inside and outside the cell. The initial compartment in this 
process is the endoplasmic reticulum (ER) where proteins undergo modifications such as glycosylation, 
disulfide bond formation, and assembly into oligomers. The modified proteins are then transported through a 
series of membrane-bound compartments which include the various cisternae of Golgi complex where further 
carbohydrate modifications occur. Transport between compartments occurs by means of vesicles that bud and 

30 fuse in a manner specific to the type of protein being transported. Once within the secretory pathway, proteins 
do not have to cross a membrane to reach the cell surface. Disruptions in the cellular secretory pathway have 
been implicated in several human diseases. In familial hypercholesterolemia the low density lipoprotein 
receptors remain in the ER rather than moving to the cell surface (Pathak, et al., J. Cell Biol., 106:1831-1841, 
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1988). Altered transport and processing of the beta-amyloid precursor protein (betaAPP) involves the 
putative vesicle transport protein prenesilin, and may play a role in early-onset Alzheimer disease (Levy- 
Lahad et al., Science, 269:973-977, 1995). Changes in the ER-derived calcium homeostasis have been 
associated with diseases such as cardiomyopathy, cardiac hypertrophy, myotonic dystrophy, Brody disease, 
5 Smith-McCort dysplasia and diabetes mellitus. 

It is believed that the protein of the invention represents a new ER integral transmembrane protein. 
This protein plays probably a role in post-translational modifications of secreted and membrane proteins. Its 
dysregulated expression may be linked to disorders such as the above referred diseases. Preferred 
polypeptides of the invention are polypeptides comprising the amino acids of SEQ ID Nos: 278, 282 and 300 

10 from positions 85 to 105, 1 16 to 136, 164 to 184, 187 to 207, 332 to 352, 376 to 396, 404 to 424, 465 to 485 
and 499 to 519. Other preferred polypeptides of the invention are fragments of SEQ ID Nos: 278, 282 and 
300 having any of the biological activity described herein. 

One object of the present invention are compositions and methods of targeting heterologous 
polypeptides to the endoplasmic reticulum by recombinantly or chemically fusing a fragment of the proteins 

15 of the invention to an heterologous polypeptide. Preferred fragments are any fragments of the proteins of the 
invention, or part thereof, that may contain targeting signals for the endoplasmic reticulum such as those 
described in Pidoux AL, Armstrong EMBO J 1992 Apr;l 1(4): 1583-91; Munro S, Pelham HR Cell 1987 Mar 
13;48(5):899-907; Pelham HR Trends Biochem Sci 1990 Dec;15(12):483-6. 

In another embodiment, the invention relates to methods and compositions using the protein of 

20 the invention or part thereof as marker proteins to selectively identify tissues, preferably salivary glands. 
For example, the proteins of the invention or part thereof may be used to synthesize specific antibodies 
using any techniques known to those skilled in the art including those described therein. Such tissue- 
specific antibodies may then be used to identify tissues of unknown origin, for example, forensic samples, 
differentiated tumor tissue that has metastasized to foreign bodily sites, or to differentiate different tissue 

25 types in a tissue cross-section using immunochemistry. 

Moreover, antibodies to the proteins of the invention or parts thereof may be used for the 
detection of endoplasmic reticulum in immunochelistry for example using any techniques known to those 
skilled in the art including those described herein. 

Protein of Seq Id No: 281 (174-10-2-F8-CS) 

30 The protein of SEQ ID No: 281 is homologous to PET117 (SwissProt ID: Q02771). MTC is 

overexpressed in the brain, dystrophic muscle, fetal liver, placenta and salivary glands. 
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The protein of the invention, herein named MTC, presents a certain homology with the yeast 
PET117 protein precursor (22 % identical amino acids, 39% positive amino acids when aligned by 
BLASTP 2.0.9). MTC appears to be a novel member of the PET family. 

Cytochrome c oxidase (complex IV), an enzyme complex located in the mitochondrial inner 
5 membrane, is the terminal member of the mitochondrial electron transport chain. The oxidation reaction 
catalyzed by cytochrome c oxidase is exergonic and is coupled to the translocation of protons across the 
membrane. This reaction provides the energy needed to drive the synthesis of ATP by the mitochondrial 
oxidative phosphorylation system and is essential for respiratory metabolism in aerobic eukaryotes. 
Cytochrome c oxidase is made up of as many as 13 non-identical protein subunits, of which 3 are encoded 

10 by the mitochondrial genome, and contains several prosthetic groups (including heme groups a and a 3 ). 

The composition of cytochrome c oxidase requires that synthesis and assembly of a functional 
enzyme complex occur in several distinct steps, including: 1) synthesis of the protein subunits, 2) 
transport of the subunits from their site of synthesis to their site of function in the mitochondrial inner 
membrane, 3) synthesis of hemes a and a 3 and, 4) assembly of the subunits with each other and with the 

15 prosthetic group. A number of "accessory" genes (e.g., not encoding protein subunits of the final 
assembled cytochrome c oxidase complex) are required for the production of functional cytochrome c 
oxidase (McEwen JE et al. - J Biol Chem - 1986, 261 (25): 1 1872-9). Some of these are required for the 
expression of mitochondrial-encoded cytochrome c oxidase subunits, while others are needed for the 
proper assembly of active cytochrome c oxidase. 

20 The nuclear genes PET1 1 7 and PET 191 belong to this class of "accessory genes" required for the 

assembly of active mitochondrial cytochrome c oxidase (McEwen JE et al. - Curr Genet. - 1993, 23(1):9- 
14). The role of PET genes and the proteins that they encode remains obscure, although mutation 
experiments in S cerevisae have clearly shown that they are essential for the production of active 
cytochrome c oxidase (McEwen JE et al. - J Biol Chem - 1986, 261 (25): 11 872-9) (McEwen JE et al. - 

25 Curr Genet. - 1993, 23(1):9-14). 

One aspect of the subject invention provides to compositions and methods of using the nucleotide 
sequence of SEQ ID No: 40, or its complement, in molecular biology techniques. In one embodiment, the 
MTC2 sequence is encoded by clone 174-10-2 -0-F8-CS. References to a polynucleotide of SEQ ID NO: 
40 and polypeptide of SEQ ID NO: 281 are interchangeable with the corresponding polynucleotides of 

30 the human cDNA of clone 174-10-2-0-F8-CS and polypeptides encoded thereby. These techniques 
include, but are not limited to: PCR; production of recombinant MTC, or biologically active fragments 
thereof, generating antisense RNA and DNA, their chemical analogs and the like; hybridization probes; 
and chromosome gene mapping. 
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As is apparent to one skilled in the art, all of the non-limiting techniques listed above can be 
practiced with fragments of the mtc2 gene. Given the well-known nature of these techniques, the skilled 
artisan will be able to select an appropriate length of the mtc2 polynucleotide for use in the techniques. 
For recombinant expression of protein, a preferred embodiment provides the full length MTC2 gene in an 
5 expression vector. 

For example, nucleotide sequence of SEQ ID No: 40 or its complement can be used to generate 
hybridization probes for mapping the naturally occurring genomic sequence. The sequence can be 
mapped to a particular chromosome or to a specific region of the chromosome using well-known 
techniques. These include in situ hybridization to chromosomal spreads, flow-sorted chromosomal 
10 preparations, or artificial chromosome constructions such as yeast artificial chromosomes, bacterial 
artificial chromosomes, bacterial PI constructions or single chromosome cDNA libraries as reviewed in 
Price (Price CM - Blood Rev. - 1993, 7(2):127-34) and Trask B (Trask BJ - Trends Genet. - 1991, 
7(5): 149-54). 

In situ hybridization of chromosomal preparations and physical mapping techniques such as 

15 linkage analysis using established chromosomal markers are invaluable in extending genetic maps that 
provides valuable information to investigators searching for disease genes using positional cloning or 
other gene discovery techniques. Once a disease or syndrome has been crudely localized by genetic 
linkage to a particular genomic region, any sequences mapping to that area can represent associated or 
regulatory genes for further investigation. The nucleotide sequence of the present invention can also be 

20 used to detect differences in the chromosomal location due to translocation, inversion, etc. among normal, 
carrier or affected individuals. 

The subject invention also provides methods of using MTC polypeptides and polynucleotides 
encoding said polypeptides in preventing or reducing the incidence of apoptosis in cells. Dysfunctions in 
the mitochondrial electron transport chain result in cellular apoptosis or necrosis. In one embodiment, 

25 MTC is added to an in vitro culture of mammalian cells in an amount effective to reduce apoptosis. In 
another embodiment, cells are transfected with vectors comprising MTC polynucleotides which cause the 
expression of MTC peptides. MTC used in these embodiments can, optionally, contain mitochondrial 
targeting sequences. In another embodiment, MTC or MTC2 are encoded by clone 174-10-2 -0-F8-CS. 

In another embodiment, MTC polypeptides and polynucleotides encoding said polypeptides can 

30 be used in the diagnosis, treatment and/or prophylaxis of disorders associated with apoptosis or 
impairment of the mitochondrial respiratory electron transport chain. Polynucleotides can also be used in 
antisense protocols for certain disorders to impair the function of the mitochondrial electron transport 
chain. These disorders include, but are not limited to, immune deficiency syndromes (including AIDS); 
type I diabetes; pathogenic infections; cardiovascular and neurological injury; alopecia; aging; 
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degenerative diseases such as Alzheimer's Disease, Parkinson's Disease, Huntington's disease; dystonia; 
Leber's hereditary optic neuropathy; schizophrenia; neonatal hepatic failure and ketoacidotic coma 
necrosis; and myodegenerative disorders such as "mitochondrial encephalopathy, lactic acidosis, and 
stroke" (MELAS), "myoclonic epilepsy ragged red fiber syndrome" (MERRF); mitochondriocytopathies, 
5 Leigh syndrome, fatal infantile cardioencephalomyopathy, ataxia; encephalopathies, aging, 
neurodegenerative diseases, myopathies, and cancers. As would be apparent to the routineer, these 
methods can be practiced with full length MTC polypeptides and polynucleotides encoding said 
polypeptides as well as biologically active fragments of the same which retain biological activity. 

For diagnostic purposes, the expression of the protein of the invention could be investigated using 

10 any of the Northern blotting, RT-PCR or immunoblotting methods well known to those skilled in the art. 
For prophylaxis and/or treatment purposes SEQ ID No: 40, its complement, or fragments of either, can be 
used to enhance electron transport and increase energy delivery using any of the gene therapy methods 
known to those skilled in the art. Likewise, SEQ ID NO: MTC2, its complement, and fragments of either 
can be used to impair electron transport and decrease energy delivery using any of the antisense 

15 methodologies known to those skilled in the art. 

Protein of SEP ID NO:392 O45-7-2-0-G5-CS) 

The protein of SEQ ID No:392, encoded by the cDNA of SEQ ID No: 151, is homologous to Unc- 
18 proteins, also known as the STXBP or Sec-1 family. The protein of the invention is strongly 
expressed in the fetal kidney. 

20 Amino acids 89 to 107 of the protein of the invention present the EMotif signature for proteins of 

the Sec-1 family (BlocksPlus PF00995). Furthermore, BLAST analysis (BLASTP version 2.0.9) of the 
amino acid sequence of the invention reveals that it is homologous to a number of proteins belonging to 
the Unc-18/Sec-l family. Preferred polypeptides of the invention are those that comprise amino acids 94, 
95, and/or 100, which are conserved in more than 80% of the Sec-1 family members; and/or amino acids 

25 43, 89, and/or 97, which are conserved in more than 60% of Sec-1 family members. Other preferred 
polypeptides of the invention are any fragment of SEQ ID NO:392 having any of the biological activities 
described herein. 

The normal function and organization of eukaryotic cells is dependent on transport of various 
vesicles that selectively shuttle membrane and cargo between distinct compartments of the secretory and 
30 endocytotic pathways. A number of key proteins involved in membrane targeting and exocytosis have 
been identified, and a fundamental set of interactions has been defined and placed into a model called the 
SNARE (Soluble N-ethylmaleimide-sensitive Attachment protein REceptor) hypothesis (Rotheman J - 
Nature - 1994, 372: p55-63). According to the SNARE hypothesis, vesicles dock to a target membrane 
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through the interaction of complementary sets of vesicular (v-SNARE) and target (t-SNARE) membrane 
proteins. Our understanding of vesicle trafficking has, to a large extent, been facilitated by 
characterization of synaptic vesicles in neurons. In synaptic vesicle exocytosis, the vesicular protein 
synaptobrevin (also called Vesicle-Associated Membrane Protein; VAMP) is the v-SNARE, and the 
5 plasma membrane-associated protein SNAP-25 (Synaptosomal-Associated Protein of 25 kDa) and 
syntaxin 1 function as t-SNARE. Formation of the SNARE complex (or core complex) is followed by 
recruitment of the cytosolic proteins alpha, beta and gamma SNAP (Soluble N-ethylmaleimide-sensitive 
Attachment Protein) and NSF (N-ethylmaleimide-Sensitive Factor), which are required for membrane 
fusion. Proteins from two gene families have been identified as key regulators of SNARE complex 

10 assembly. These include members of the small GTP-binding family (e.g. Rabs) and the Sec-1 family. 
The Sec-1 gene is one of ten genes identified as essential for the final stages of protein secretion in yeast 
(S. cerevisae). Sec-1 homologues have been identified in the nervous system of C. elegans (Unc-18), D. 
melanogaster (Rop) and mammals. In mammals, the protein has been termed Mammalian homologue of 
the Unc-18 gene (Munc-18), rbSecl (Rat Brain Seel) or n-Secl (neural-specific Seel). 

15 Sec-l-related proteins are involved in the processes of vesicle targeting, docking and/or fusion. 

Sec-1 -related proteins interact directly with the t-SNARE syntaxin, and Munc-18 has been found to 
interact with syntaxin isoforms la, 2 and 3. However, Munc-18 has not been found to be part of the 20S 
SNARE/SNAP/NSF protein complex. In vitro, the binding of Munc-18 to syntaxin inhibits the 
interaction of syntaxin with VAMP and SNAP-25 as well as SNAP-23 (a homologue of SNAP-25) and 

20 thereby negatively regulates the formation of the synaptic SNARE fusion complex. In agreement with a 
negative regulatory role of Sec-1 /Munc-18 proteins in neurotransmitter release are results showing that 
microinjections of Sec-1 into the presynaptic terminal of the giant squid synapse inhibits evoked 
transmitter release (Dresbach T. et al. - J Neurosci. - 1998, 18: p2923-2932). Furthermore, 
overexpression of Rop, Unc-18, Sec-1 and Munc-18 all result in phenotypes associated with a complete 

25 block in neurotransmitter release and/or secretion (Hosono R. et al. - J Neurochem - 1992; 58: pl5 17- 
1525; Harrison S. et al. - Neuron - 1994; 13: p555-566; Novick P. et al. - Cell - 1981; 25: p461-469; 
verhage M. et al. - Science - 2000; 287: p864-869). Point mutation experiments involving the Rop gene 
suggest that Rop is a rate-limiting regulator of exocytosis that performs both stimulatory and inhibitory 
functions in neurotransmission (Wu M. et al. - EMBO J - 1998; 17: pl27-139). The reduction in 

30 neurotransmitter release seen after both overexpression of Munc-18 and mutations in Munc-18 
homologues indicates that Sec-1 proteins not only sequester syntaxins from other proteins but also assist 
the syntaxins in adopting a functional conformation or facilitate interactions between syntaxins and other 
proteins by a chaperone-like action. The necessity of Seel -related proteins is believed to result, in part, 
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from their direct and high affinity interaction with members of the t-SNARE family of syntaxin proteins 
and from the control by this complex of a v- and t-SNARE protein interaction required for vesicle fusion. 

The SNARE mechanism of exocytosis appears to be conserved both evolutionarily (most of the 
components have homologues in species from yeast to mammals) and functionally (each of the principal 
5 components are members of multigene families). This latter point is supported by work showing that 
components of this pathway are found in different cell types (neurons, neutrophils and pancreatic beta- 
cells) (brumell J. et al. - J immunol - 1995; 155: p5750-5759; Zhang W et al. - J Biol Chem. - 2000 Oct 
6, electronic publication). 

It is believed that the protein of SEQ ED NO:392 is a member of the Unc-18/Sec-l family, and 
10 thus plays a key role in the regulation of various processes including vesicle targeting, docking and 
fusion. 

One embodiment of the present invention relates to the use of the protein of SEQ ID NO: 3 92 or 
the cDNA of SEQ ID NO: 15 1 or any part thereof to used to identify fetal kidney tissue and cells derived 
from this tissue, since the protein of the invention is strongly expressed in this tissue. In addition, the 

1 5 protein of the invention can be used to specifically label components of the secretory pathway within 
cells. Assays for the detection of cells expressing the protein of the invention, or part thereof, can be 
developed using techniques known to those skilled in the art. For example, the protein of the invention, 
or part thereof, can be used to generate antibodies or antiserum, by techniques well known to those skilled 
in the art. Antibodies or antiserum can also be used for quantitative analysis or detection of the protein of 

20 the invention, by methods such as enzyme-linked immunosorbant assays (ELISA) or by any other 
technique known to those skilled in the art. Another possible technique involves the use of marked 
syntaxins, since Seel -related proteins are known to bind to syntaxins. 

In another embodiment of the present invention, the present polynuleotides and polypeptides can 
be used to diagnose, treat, and/or prevent any of a large number of diseases and disorders characterized by 

25 abnormal exocytosis, such as, but not limited to: allergies including hay fever, asthma, and urticaria; 
neurologic disorders, a number of which result from abnormal neurotransmitter secretion (for example, 
depression is associated with decreased serotonin secretion); autoimmune hemolytic anemia; cancers, 
especially hormone-dependent cancers such as those stimulated by androgens (for example, prostate 
cancer) or estrogens (for example, breast cancer), leukemias or lymphomas; ulcerative colitis; type 2 

30 diabetes, which in some cases is associated with decreased insulin secretion; proliferative 
granulonephritis; inflammatory bowel disease; growth failure due to decreased secretion of growth 
hormone; multiple sclerosis; myasthenia gravis, rheumatoid and osteoarthritis; scleroderma,; Chediak- 
Higashi and Sjogren's syndromes; systemic lupus erythematosus; thyroiditis; toxic shock syndrome; 
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traumatic tissue damage; viral, bacterial, fungal and protozoal infections; and other 
physiologic/pathologic disorders associated with induced or otherwise abnormal vesicular trafficking. 

An association between the level of expression and/or activity of the present protein with the 
presence or absence of any condition associated with abnormal vesicular trafficking, such as any of the 
5 above-listed disorders, can readily be assessed by detecting the level of expression or activity of the 
protein by, e.g., Northern blot, western blot, ELISA, or any standard in vitro or in vivo assay for protein 
activity, and correlating the observed level or expression or activity with the presence or absence of the 
disorder. For those disorders found to be positively associated with the protein of the invention, a 
diagnostic or screening assay can be readily developed where the detection of an elevated level of protein 

10 or protein activity is indicative of the presence of the disease, or of a propensity to develop the disease. 
Further, any such diseases or conditions can be treated or prevented by inhibiting the expression or 
activity of the protein, for example by administering to a patient suffering from the disorder any inhibitor 
including, but not limited to, antibodies, antisense oligonucleotides, dominant negative forms of the 
protein, and small molecule inhibitors of protein expression or activity. Alternatively, disorders 

15 negatively associated with the protein of the invention can be diagnosed or screened for by detecting the 
level of the present protein or protein activity, where a decreased level of the protein or protein activity is 
indicative of the presence of the disease, or of a propensity to develop the disease. Such disorders 
negatively associated with the protein of the invention can be treated or prevented by increasing the level 
of the protein or protein activity, for example by administering to a patient any of a number of agents 

20 including, but not limited to, the protein itself, a polynucleotide encoding the protein, or a heterologous 
compound that enhances the expression or activity of the protein. 

protein of SEP ID NO:419 (internal designation 188-9-1-0-ClO-CS) 

The protein of SEQ ID NO:419, highly expressed in the brain and placenta, is encoded by the 
cDNA of SEQ ID NO: 178, is localized preferentially in the endoplasmic reticulum, and is homologous to 

25 the yeast integral membrane protein SFT2p, a member of the SNARE-related family (Genbank accession 
number X79489). SFT2p is well conserved in C. elegans and in mice (accession numbers CAA93859 
and AA790425 respectively), and plays an important role in the protein trafficking and fusion machinery 
of eukaryotic cells. The 159-amino-acid-long protein of the invention, which is similar in size and in 
membrane topology to the SFT2p protein, displays four conserved hydrophobic stretches from positions 

30 36 to 56, 66 to 86, 98 to 118 and 122 to 142, forming a tetra-spanning membrane protein. This topology 
is also found in the Gotlp protein, another well-conserved SNARE related protein with similar functions 
to those of SFT2p protein (accession number AL010285 for P. falciparum, U23521 for C. elegans) as 
described in Conchon et al., EMBO J., 18(14):3934-3946 (1999). 
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Eukaryotic proteins are synthesized within the endoplasmic reticulum (ER), are delivered from 
the ER to the Golgi complex for post-translational processing and sorting, and are transported from the 
Golgi to specific intracellular and extracellular destinations. This intracellular and extracellular movement 
of protein molecules is termed vesicle trafficking. Trafficking is accomplished by the packaging of 
5 protein molecules into specialized vesicles which bud from the donor organelle membrane and fuse to the 
target membrane (Palade, Science 189:347-358 (1975)). 

Numerous proteins are necessary for the formation, targeting, and fusion of transport vesicles and 
for the proper sorting of proteins into these vesicles. The vesicle trafficking machinery includes coat 
proteins which promote the budding of vesicles from donor membranes, vesicle- and target-specific 

10 identifiers (v-SNAREs and t-SNAREs) which bind to each other and dock the vesicle to the target 
membrane (Nichols et al., Nature 387:199-202, 1997), and proteins which bind to SNARE complexes and 
initiate fusion of the vesicle to the target membrane (SNAPs). 

SFT2p is a conserved yeast protein with four transmembrane domains that is resident in punctate 
structures corresponding to the late Golgi compartment, and which enters presumptive retrogade intra- 

15 Golgi vesicles whose fusion depends on two t-SNARE proteins Sed5p and Sftlp (Wooding and Pelham, 
Mol. Biol. Cell 9:2667-2680 (1998)). Its genetic interaction with Sed5p suggests that SFT2p is an 
additional membrane component involved in the docking or fusion process. In vivo experiments have 
shown that deletion of GOTlp or SFT2p alone does not affect cell growth, but repression of both of these 
proteins results in a significant accumulation of ER membrane, suggesting that the presence of either 

20 SFT2p or GOTlp is required for the maintenance of efficient ER-Golgi transport (Conchon et al., supra). 
It has also been shown that Gotlp normally facilitates Sed5p-dependant fusion events, while Sft2p 
performs a related function in the late Golgi (Conchon et al., supra). 

The etiology of numerous human diseases and disorders can be attributed to defects in the 
trafficking of proteins to organelles or the cell surface. For example, defects in the trafficking of 

25 membrane-bound receptors and ion channels have been implicated in cystic fibrosis (cystic fibrosis 
transmembrane conductance regulator; CFTR), glucose-galactose malabsorption syndrome (Na.sup.+ 
/glucose cotransporter), hypercholesterolemia (low-density lipoprotein (LDL) receptor), and forms of 
diabetes mellitus (insulin receptor). Abnormal hormonal secretion has been linked to disorders including 
diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter 

30 (thyroid hormone), and Cushing's and Addison's diseases (adrenocorticotropic hormone; ACTH). 

Further, cancer cells secrete excessive amounts of hormones or other biologically active peptides. 
Disorders related to excessive secretion of biologically active peptides by tumor cells include: fasting 
hypoglycemia due to increased insulin secretion from insulinoma-islet cell tumors; hypertension due to 
increased epinephrine and norepinephrine secreted from pheochromocytomas of the adrenal medulla and 
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sympathetic paraganglia; and carcinoid syndrome, which includes abdominal cramps, diarrhea, and 
valvular heart disease, caused by excessive amounts of vasoactive substances (serotonin, bradykinin, 
histamine, prostaglandins, and polypeptide hormones) secreted from intestinal tumors. Ectopic synthesis 
and secretion of biologically active peptides (peptides not expected from a tumor) includes ACTH and 
5 vasopressin in lung and pancreatic cancers; parathyroid hormone in lung and bladder cancers; calcitonin 
in lung and breast cancers; and thyroid-stimulating hormone in medullary thyroid carcinoma. 

It is believed that the protein of SEQ ED NO:419 or part thereof is an integral membrane protein 
of the SNARE-related family, and more presumably is the human homologue of the yeast SFT2p protein. 
Thus, the protein of the invention plays a role in the secretory and endocytic pathway of eukaryotic cells 
10 through fusion and transport of vesicles from the endoplasmic reticulum to late Golgi cisternae. Preferred 
polypeptides of the invention are polypeptides comprising the amino acids of SEQ ED NO:419 of the four 
transmembrane domains from positions 36 to 56, 66 to 86, 98 to 118 and 122 to 142. Other preferred 
polypeptides of the invention are fragments of SEQ ID NO:419 having any of the biological activities 
described herein. 

15 In one embodiment, the invention relates to methods and compositions using the protein of the 

invention or part thereof as a new marker protein to selectively identify secretory and endocytic traffic, 
preferably in the endoplasmic reticulum and more preferably in the late Golgi cisternae. For example, the 
protein of the invention or part thereof may be detected using specific antibodies generated against the 
protein using any technique known to those skilled in the art. Such organelle-specific antibodies may 

20 then be used to identify cells with disrupted trafficking systems such as in differentiated tumor cells or to 
differentiate specific organelle types in a cell cross-section using immunochemistry. In addition, the 
protein of the invention can be used to specifically identify cells of the brain and/or placenta, tissues in 
which the protein is overexpressed. 

Another embodiment of the present invention relates to methods of targeting heterologous 

25 compounds, such as polypeptides or polynucleotides, to the endoplasmic reticulum and preferentially to 
late Golgi vesicles by recombinantly or chemically fusing a fragment of the protein of the invention to the 
heterologous polypeptide or polynucleotide. Such fusion proteins may be engineered to contain a 
cleavage site located between a sequence encoding the protein of the invention and the heterologous 
protein sequence, so that the protein of the invention may be cleaved and purified away from the 

30 heterologous moiety. Preferred fragments of the protein that can be used in such applications are the four 
transmembrane domains or any other fragments of the protein of the invention, or part thereof, that may 
contain targeting signals for ER or Golgi organelles as defined in Conchon et al., supra; Wooding and 
Pelham, supra. Such heterologous compounds may be targeted to the secretory pathway to modulate ER- 
Golgi endocytic and secretory activities. In one embodiment, the protein of the invention can be used to 
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screen peptide libraries for inhibitors of traffic activity, as detected by the accumulation of ER membranes 
or Golgi vesicles as described in Conchon et al., supra. 

In still another embodiment, the protein of the invention is used to diagnose, prevent and/or treat 
any of a number of disorders in which trafficking and/or the fusion machinery is affected, including, but 
5 not limited to, endocrine, secretory, inflammatory, and gastrointestinal disorders, such as cancer, cystic 
fibrosis (cystic fibrosis transmembrane conductance regulator; CFTR, as well as membrane-bound 
receptors and ion channels associated with CFTR), glucose-galactose malabsorption syndrome (Na.sup.+ 
/glucose cotransporter), hypercholesterolemia (low-density lipoprotein (LDL) receptor), and forms of 
diabetes mellitus (insulin receptor), abnormal hormonal secretion linked to disorders including diabetes 

10 insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter (thyroid 
hormone), Cushing's and Addison's diseases (adrenocorticotropic hormone; ACTH), disorders related to 
excessive secretion of biologically active peptides by tumor cells including fasting hypoglycemia due to 
increased insulin secretion from insulinoma-islet cell tumors, hypertension due to increased epinephrine 
and norepinephrine secreted from pheochromocytomas of the adrenal medulla and sympathetic 

15 paraganglia, carcinoid syndrome, which includes abdominal cramps, diarrhea, and valvular heart disease, 
caused by excessive amounts of vasoactive substances (serotonin, bradykinin, histamine, prostaglandins, 
and polypeptide hormones) secreted from intestinal tumors. Ectopic synthesis and secretion of 
biologically active peptides (peptides not expected from a tumor) includes ACTH and vasopressin in lung 
and pancreatic cancers; parathyroid hormone in lung and bladder cancers; calcitonin in lung and breast 

20 cancers; and thyroid-stimulating hormone in medullary thyroid carcinoma. 

An association between the level of expression and/or activity of the present protein with the 
presence or absence of any condition associated with abnormal vesicular trafficking and/or secretion, such 
as any of the above-listed disorders, can readily be assessed by detecting the level of expression or 
activity of the protein by, e.g., Northern blot, western blot, ELISA, or any standard in vitro or in vivo 

25 assay for protein activity, and correlating the observed level or expression or activity with the presence or 
absence of the disorder. For those disorders found to be positively associated with the protein of the 
invention, a diagnostic or screening assay can be readily developed where the detection of an elevated 
level of protein or protein activity is indicative of the presence of the disease, or of a propensity to 
develop the disease. Further, any such diseases or conditions can be treated or prevented by inhibiting the 

30 expression or activity of the protein, for example by administering to a patient suffering from the disorder 
any inhibitor including, but not limited to, antibodies, antisense oligonucleotides, dominant negative 
forms of the protein, and small molecule inhibitors of protein expression or activity. Alternatively, 
disorders negatively associated with the protein of the invention can be diagnosed or screened for by 
detecting the level of the present protein or protein activity, where a decreased level of the protein or 
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protein activity is indicative of the presence of the disease, or of a propensity to develop the disease. 
Such disorders that are negatively associated with the protein of the invention can be treated or prevented 
by increasing the level of the protein or protein activity, for example by administering to a patient any of a 
number of agents including, but not limited to, the protein itself, a polynucleotide encoding the protein, or 
5 a heterologous compound that enhances the expression or activity of the protein. 

Cancer cells secrete excessive amounts of hormones or other biologically active peptides. 
Therefore, in another embodiment, antagonists or inhibitors of the protein of the invention may be 
administered to a subject to treat or prevent cancers by inhibiting the traffic activity in transformed cells. 
Any type of cancer can be treated or prevented in this way, including, but not limited to, adenocarcinoma, 

10 sarcoma, melanoma, lymphoma, and leukemia. In preferred embodiments, the cancers include cancers of 
glands, tissues, and organs involved in secretion or absorption, such as prostate, pancreas, lung, tongue, 
brain, breast, bladder, adrenal gland, thyroid, liver, uterus, ovary, kidney, testes, and organs of the 
gastrointestinal tract including small intestine, colon, rectum, and stomach. In a particular aspect, 
antibodies which are specific for the protein of the invention may be used directly as an antagonist, or 

15 indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells or tissues 
which express the protein of the invention. In addition, the elevated amount of the protein of the 
invention in tumor cells can readily be used to diagnose or screen for cancer, e.g. by measuring and 
comparing the level of the protein in a cell to that of a control cell using a specific antibody detected by 
FACS or using any other detection method known to those of skill in the art. 

20 

Protein of SEP ID NO:297 (181-3-3-0-C9-CS) 

The protein of SEQ ID No:297, encoded by the cDNA of SEQ ID NO:56, is homologous to 
synaptogyrin 1 (Trembl ID: Q9UGZ4). The protein of the invention is highly expressed in the brain and 
fetal brain, fetal liver and the testis. 

25 The protein of SEQ ID No:297 is a splice variant of synaptogyrin 1 . The splicing of the cDNA of 

SEQ ID NO:56 is different for exon 3: whereas exon 3 of synaptogyrin 1 is 238 base-pair long, exon 3 of 
SEQ ID NO:56 is 345 base-pair long. This introduces a frameshift and a stop codon. Thus, the protein of 
SEQ ID NO:297 is identical to synaptogyrin 1 up to and including amino acid 122, the remaining 22 
amino acids are entirely different. When compared to synaptogyrin 1, the protein of the invention 

30 presents the same N-terminal domain (which is highly conserved in all synaptogyrins) and 2 of the 4 
transmembrane helixes. Preferred polypeptides of the invention are those that comprise amino acids 1 to 
16, which make up the N terminal cytoplasmic domain of the protein and which are highly conserved 
among all members of the synaptogyrin family (Kedra D et al. - Hum Genet. - 1998, 103(2): 13 1-141). 
Other preferred polypeptides of the invention are those that comprise amino acids 25 to 45 and/or 68 to 
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88, which make up the two transmembrane alpha helixes. Thus it is believed that the protein of the 
invention is a member of the synaptogyrin family. 

Synaptogyrins are closely related to proteins of the synaptophysin family, both of which are 
involved in neurotransmission and more generally in exocytosis and vesicle trafficking. Members of the 
5 synaptogyrin family include synaptogyrin 1 (with splice variants la, lb and lc), cellugyrin (synaptogyrin 
2) and synaptogyrin 3. This family of proteins is also evolutionarily conserved, as homologues to human 
synaptogyrin 1 have been found in rats, mice, and C. elegans. Synaptogyrins and synaptophysin are 
among the most abundant vesicle components-together they account for more than 10% of the total 
vesicle membrane proteins. Although synaptogyrins do not appear to be required for exocytosis itself 

10 (apparently because synaptogyrins and synaptophysins have overlapping functions), they are essential for 
the normal regulation of exocytosis. 

The normal function and organization of eukaryotic cells is dependent on the transport of various 
vesicles that selectively shuttle membrane and cargo between distinct compartments of the secretory and 
endocytotic pathways. A number of key proteins involved in membrane targeting and exocytosis have 

15 been identified, and a fundamental set of interactions has been defined and placed into a model called the 
SNARE (Soluble N-ethylmaleimide-sensitive Attachment protein REceptor) hypothesis (Rotheman J - 
Nature - 1994, 372: p55-63). According to the SNARE hypothesis, vesicles dock to a target membrane 
through the interaction of complementary sets of vesicular (v-SNARE) and target (t-SNARE) membrane 
proteins. Our understanding of vesicle trafficking has, to a large extent, been facilitated by 

20 characterization of synaptic vesicles in neurons. In synaptic vesicle exocytosis, the vesicular protein 
synaptobrevin and synaptogyrin (also called Vesicle-Associated Membrane Protein; VAMP) are the v- 
SNARE, and the plasma membrane-associated protein SNAP -25 (Synaptosomal-Associated Protein of 25 
kDa) and syntaxin 1 function as t-SNARE. Formation of the SNARE complex (or core complex) is 
followed by recruitment of the cytosolic proteins alpha, beta and gamma SNAP (Soluble N- 

25 ethylmaleimide-sensitive Attachment Protein) and NSF (N-ethylmaleimide-Sensitive Factor), which are 
required for membrane fusion. In transfected PC 12 cells, synaptogyrin 1 and synaptophysin 1 are as 
effective as tetanus toxin light chain in inhibiting exocytosis (Sugita S. et al. - J Biol Chem. - 1999, 
274(27): 18893-901), suggesting that these proteins are strong regulators of exocytosis. More recently, 
synaptogyrins have been found to have an essential function in synaptic plasticity (Janz R. et al - Neuron. 

30 - 1999,24(3):687-700). 

The etiology of numerous human diseases and disorders can be attributed to defects in the 
trafficking of proteins to organelles or the cell surface. For example, defects in the trafficking of 
membrane-bound receptors and ion channels have been implicated in cystic fibrosis (cystic fibrosis 
transmembrane conductance regulator; CFTR), glucose-galactose malabsorption syndrome (Na.sup.+ 
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/glucose cotransporter), hypercholesterolemia (low-density lipoprotein (LDL) receptor), and forms of 
diabetes mellitus (insulin receptor). Abnormal hormonal secretion has been linked to disorders including 
diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter 
(thyroid hormone), and Cushing's and Addison's diseases (adrenocorticotropic hormone; ACTH). 
5 Further, cancer cells secrete excessive amounts of hormones or other biologically active peptides. 

Disorders related to excessive secretion of biologically active peptides by tumor cells include: fasting 
hypoglycemia due to increased insulin secretion from insulinoma-islet cell tumors; hypertension due to 
increased epinephrine and norepinephrine secreted from pheochromocytomas of the adrenal medulla and 
sympathetic paraganglia; and carcinoid syndrome, which includes abdominal cramps, diarrhea, and 

10 valvular heart disease, caused by excessive amounts of vasoactive substances (serotonin, bradykinin, 
histamine, prostaglandins, and polypeptide hormones) secreted from intestinal tumors. Ectopic synthesis 
and secretion of biologically active peptides (peptides not expected from a tumor) includes ACTH and 
vasopressin in lung and pancreatic cancers; parathyroid hormone in lung and bladder cancers; calcitonin 
in lung and breast cancers; and thyroid-stimulating hormone in medullary thyroid carcinoma. 

15 In one embodiment, the invention relates to methods and compositions using the protein of the 

invention or part thereof as a new marker protein to selectively identify secretory and endocytic traffic, 
preferably in the endoplasmic reticulum and more preferably in the late Golgi cisternae. For example, the 
protein of the invention or part thereof may be detected using specific antibodies generated against the 
protein using any technique known to those skilled in the art. Such organelle-specific antibodies may 

20 then be used to identify cells with disrupted trafficking systems such as in differentiated tumor cells or to 
differentiate specific organelle types in a cell cross-section using immunochemistry. In addition, the 
protein of the invention can be used to specifically identify cells of the brain, fetal brain, fetal liver and 
the testis, tissues in which the protein is overexpressed. 

Another embodiment of the present invention relates to methods of targeting heterologous 

25 compounds, such as polypeptides or polynucleotides, to the components of the secretory machinery by 
recombinantly or chemically fusing a fragment of the protein of the invention to the heterologous 
polypeptide or polynucleotide. Such fusion proteins may be engineered to contain a cleavage site located 
between a sequence encoding the protein of the invention and the heterologous protein sequence, so that 
the protein of the invention may be cleaved and purified away from the heterologous moiety. Such 

30 heterologous compounds may be targeted to the secretory pathway to modulate ER-Golgi endocytic and 
secretory activities. In one embodiment, the protein of the invention can be used to screen peptide 
libraries for inhibitors of traffic activity, as detected by the accumulation of ER membranes or Golgi 
vesicles as described in Conchon et al., supra. 
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In still another embodiment, the protein of the invention is used to diagnose, prevent and/or treat 
any of a number of disorders in which trafficking and/or the fusion machinery is affected, including, but 
not limited to, endocrine, secretory, inflammatory, and gastrointestinal disorders, such as cancer, cystic 
fibrosis (cystic fibrosis transmembrane conductance regulator; CFTR, as well as membrane-bound 
5 receptors and ion channels associated with CFTR), glucose-galactose malabsorption syndrome (Na.sup.+ 
/glucose cotransporter), hypercholesterolemia (low-density lipoprotein (LDL) receptor), and forms of 
diabetes mellitus (insulin receptor), abnormal hormonal secretion linked to disorders including diabetes 
insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter (thyroid 
hormone), Cushing's and Addison's diseases (adrenocorticotropic hormone; ACTH), disorders related to 

10 excessive secretion of biologically active peptides by tumor cells including fasting hypoglycemia due to 
increased insulin secretion from insulinoma-islet cell tumors, hypertension due to increased epinephrine 
and norepinephrine secreted from pheochromocytomas of the adrenal medulla and sympathetic 
paraganglia, carcinoid syndrome, which includes abdominal cramps, diarrhea, and valvular heart disease, 
caused by excessive amounts of vasoactive substances (serotonin, bradykinin, histamine, prostaglandins, 

15 and polypeptide hormones) secreted from intestinal tumors. Ectopic synthesis and secretion of 
biologically active peptides (peptides not expected from a tumor) includes ACTH and vasopressin in lung 
and pancreatic cancers; parathyroid hormone in lung and bladder cancers; calcitonin in lung and breast 
cancers; and thyroid-stimulating hormone in medullary thyroid carcinoma. 

An association between the level of expression and/or activity of the present protein with the 

20 presence or absence of any condition associated with abnormal vesicular trafficking and/or secretion, such 
as any of the above-listed disorders, can readily be assessed by detecting the level of expression or 
activity of the protein by, e.g., Northern blot, western blot, ELISA, or any standard in vitro or in vivo 
assay for protein activity, and correlating the observed level or expression or activity with the presence or 
absence of the disorder. For those disorders found to be positively associated with the protein of the 

25 invention, a diagnostic or screening assay can be readily developed where the detection of an elevated 
level of protein or protein activity is indicative of the presence of the disease, or of a propensity to 
develop the disease. Further, any such diseases or conditions can be treated or prevented by inhibiting the 
expression or activity of the protein, for example by administering to a patient suffering from the disorder 
any inhibitor including, but not limited to, antibodies, antisense oligonucleotides, dominant negative 

30 forms of the protein, and small molecule inhibitors of protein expression or activity. Alternatively, 
disorders negatively associated with the protein of the invention can be diagnosed or screened for by 
detecting the level of the present protein or protein activity, where a decreased level of the protein or 
protein activity is indicative of the presence of the disease, or of a propensity to develop the disease. 
Such disorders that are negatively associated with the protein of the invention can be treated or prevented 
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by increasing the level of the protein or protein activity, for example by administering to a patient any of a 
number of agents including, but not limited to, the protein itself, a polynucleotide encoding the protein, or 
a heterologous compound that enhances the expression or activity of the protein. 

Cancer cells secrete excessive amounts of hormones or other biologically active peptides. 
5 Therefore, in another embodiment, antagonists, inhibitors, or other modulators of the protein of the 
invention may be administered to a subject to treat or prevent cancers by inhibiting the traffic activity in 
transformed cells. Any type of cancer can be treated or prevented in this way, including, but not limited 
to, adenocarcinoma, sarcoma, melanoma, lymphoma, and leukemia. In preferred embodiments, the 
cancers include cancers of glands, tissues, and organs involved in secretion or absorption, such as 

10 prostate, pancreas, lung, tongue, brain, breast, bladder, adrenal gland, thyroid, liver, uterus, ovary, kidney, 
testes, and organs of the gastrointestinal tract including small intestine, colon, rectum, and stomach. In a 
particular aspect, antibodies which are specific for the protein of the invention may be used directly as an 
antagonist, or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells 
or tissues which express the protein of the invention. 

15 In addition, the present protein can be used to diagnose, treat, and prevent any neurological or 

psychiatric disorder or condition associated with abnormal neurotransmitter release, such as depression, 
which is associated with decreased serotonin secretion, or any neurological function, e.g. memory, which 
could be enhanced or otherwise modulated by altering the quantity, frequency, or any other property of 
neurotransmitter release in one or more cell types in the nervous system. 

20 Proteins of SEP ID NOs:247 and 246 (internal designations 105-031-2-0-D3-CS and 105-031-1-0-A2- 
CS) 

The protein of SEQ ID NOs:247 and 246, encoded by the cDNAs of SEQ ID NOs:6 and 5, 
respectively, are overexpressed in liver, pancreas, and prostate. The proteins of the invention are strongly 
homologous to the human membrane-bound protein PR0836 (GENSEQP accession number: W63687), 

25 and to the human secreted protein 7 (GENSEQP accession number: Y57941), The proteins of the 
invention also share homology with the chaperone-associated protein, SLSlp, found in yeast Yarrowia 
lipolytica (GENPEPT accession number Z50154), having 27% identity from amino-acids 68 to 340 of 
protein of SEQ ID No:247. In addition, the proteins of SEQ ID NOs:247 and 246 share homology with 
two Hsp70 family proteins, Hsp-binding protein 1 found in mice (GENSEQP accession number: Z50154), 

30 and human species (GENPEPT accession number: AF093420), and Hsp-binding protein 2 found in 
human species (GENPEPT accession number: AF187859). 

The proteins of the invention are related to a yeast lumen protein of the endoplasmic reticulum, 
SLSlp. This protein acts in the preprotein translocation process, interacting directly with translocating 
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polypeptides to facilitate their transfer and/or help their folding in the endoplasmic reticulum (Boisrame et 
al. J Biol Chem 1996; 271:11668-75). In addition, Slslp is believed to act as a cofactor of the chaperon 
protein Kar2 (Boisrame et al. J Biol Chem 1998; 273:30903-8 ; Kabani et al. Gene 2000; 241:309-15). 
Thus, the proteins of the invention are presumed to have similar cellular functions as those of chaperones. 
5 Such functions include a number of cellular processes, such as protein folding, disassembly of oligomeric 
protein structures, regulation of apoptosis, protein degradation, protein translocation in the endoplasmic 
reticulum, and antigen-presentation (Bukau et al. Cell 1998; 92:351-66). Chaperones are also involved in 
a number of disorders, especially autoimmune diseases such as type 1 diabetes, rheumatoid arthritis, 
systemic lupus erythematosus, Sjogren syndrome, and mixed connective tissue disease (Feige et al. EXS 

10 1996; 77:359-73; Feili-Hariri et al. J Autoimmun 2000; 14:133-42). Chaperones are also involved in 
various disorders including tuberculosis and leprosy (Zugel et al. Clin Microbiol Rev 1999; 12:19-39), 
neurogenerative disorders such as Alzheimer and Parkinson diseases (Yoo et al. J Neural Transm Suppl 
1999; 57:315-22), and malignant disorders (Csermely et al. Pharmacol Ther 1998; 79:129-68). In 
addition, a growing body of evidence suggests the involvement of the Hsp60 chaperone in the 

15 development of atherosclerosis (Xu et al. Circulation 2000; 102:14-20). Thus, the present proteins, which 
are presumed to be co-factors of a chaperon as summarized above, are believed to have analogous cellular 
functions and to be involved in similar pathological processes. 

In one embodiment, the present invention provide methods of using the present proteins to 
identify specific cell types in vitro and in vivo. For example, as chaperone proteins are often upregulated 

20 in response to cellular stress, the detection of cells expressing elevated levels of the proteins provides a 
tool for detecting cells under stress. As cellular stress has been implicated in a number of disorders, such 
as cardiovascular disorders, neurodegenerative disorders, and cancer, the ability to detect such stress thus 
provides a diagnostic or screening tool for such conditions. In addition, the present polypeptides and 
polynucleotides can be used to identify liver, pancreas, and prostate tissues, and cells derived from these 

25 tissues. The ability to specifically visualize such tissues and cells is useful for a number of applications, 
including to determine the origin or identity of, e.g. cancerous cells, as well as to facilitate the 
identification of particular cells and tissues for, e.g. the evaluation of histological slides. 

In addition, the present polypeptides and polynucleotides can be used to develop diagnostic and 
screening assays for diseases characterized by an abnormal level or activity of the protein of SEQ ID 

30 NOs:247 and 246. Such disorders include, but are not limited to, infectious diseases, neurogenerative 
disorders such as Alzheimer's and Parkinson's diseases, schizophrenia, alopecia, aging, atherosclerosis, 
malignant disorders of various types, and autoimmune diseases including type 1 diabetes, rheumatoid 
arthritis, systemic lupus erythematosus, Sjogren syndrome, and mixed connective tissue disease. Such 
assays can be performed using any biological sample, such as serum or plasma. 
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In still another embodiment, the proteins of the invention or part thereof can be used to prevent 
cells from undergoing apoptosis. Specifically, as chaperone proteins have been shown to protect cells 
from apoptosis, any method of increasing the level or activity of the present protein can be used to 
prevent cells from undergoing apoptosis, in vitro or in vivo. For example, a polynucleotide encoding a 
5 protein of SEQ ID NO:247 or 246, or any fragment or derivative thereof, can be introduced into cells, e.g. 
in a vector, wherein the protein is expressed in the cells. Alternatively, a protein of SEQ ID NO:247 or 
246 itself can be administered to cells, preferably in a formulation that leads to the internalization of the 
protein by the cells. Also, any compound that increases the expression or activation of the proteins within 
the cells can be administered. Preventing cells from undergoing apoptosis can be used for any of a large 

10 number of purposes, including, but not limited to, to prevent the death of cells being grown in culture, to 
prevent in a patient the apoptosis associated with any of a number of disorders, or to prevent apoptosis in 
cells of a patient undergoing a treatment that increases the level of cellular stress, such as chemotherapy. 

In another embodiment, inhibiting the proteins of the invention can be used to induce apoptosis in 
undesired cells. Such inhibition can be accomplished in any of a number of ways, including, but not 

15 limited to, using antibodies, antisense sequences, dominant negative forms of the protein, or small 
molecule inhibitors of the expression or activity of the proteins. Such induction of apoptosis can be used 
to eliminate any undesired cells, for example cancer cells, in a patient. Preferably, such inhibitors are 
targeted specifically to the undesired cells in the patient. 

In another embodiment, various disorders can be treated, attenuated and/or prevented by a protein 

20 of SEQ ID NOs:247 or 246, or part thereof, or any other compound that can affect the level or activity of 
the proteins such as nucleic acids, antibodies, or chemical substances. In a preferred embodiment, 
proteins or other compounds directed to the proteins of the invention can be used to treat or prevent 
disorders in which the activity or level of the proteins of SEQ ID NO:247 or 246 is unbalanced. Such 
diseases include, but are not limited to, infectious diseases, neurogenerative disorders as Alzheimer and 

25 Parkinson diseases, schizophrenia, alopecia, aging, atherosclerosis, malignant disorders of various types, 
and autoimmune diseases including type 1 diabetes, rheumatoid arthritis, systemic lupus erythematosus, 
Sjogren syndrome, mixed connective tissue disease, malignant disorders, autoimmune and any other 
neurodegenerative disorder. In another embodiment, the proteins of SEQ ID NO:247 or 246 or part 
thereof can be used as vaccines for various disorders including, but not limited, to cancer (Wang et al. 

30 Immunol Invest 2000;29:131-7), tuberculosis (Silva et al. Microbes Infect 1999;1:429-35), diabetes (Int 
Immunol 1999; 1 1:957-66), and atherosclerosis (Xu et al. Arterioscler Thromb 1992;12:789-99). 
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Protein of SEP ID NQ:389 (internal designation 1 09-003- 1-0-G4-CS ) 

The protein of SEQ ID NO:389 is encoded by the cDNA of SEQ ED NO: 148. Accordingly, it 
will be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO: 3 89 described 
throughout the present application also pertain to the polypeptide encoded by the human cDNA of clone 
5 109-003-1-0-G4-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic acid 
of SEQ ID NO: 148 described throughout the present application also pertain to the human cDNA of clone 
109-003-1-0-G4-CS. The protein of SEQ ID NO:389 is highly homologous to two human proteins 
encoded by genes listed in Genbank under accession numbers AF143723 and AF1 12210, the disclosures 
of which are incorporated herein by reference in their entireties. 

10 The polypeptide encoded by Genbank accession numbers AF143723 and AF1 12210 belong to the 

Hsp70 protein family (even though one of them has erroneously been attributed to the related Hsp60 
family). Many genes encoding "Hsps" (heat shock proteins) have been cloned and sequenced, including, 
for example, human hsp70 (GenBank Accession Nos. Ml 1717 and M15432; see also Hunt and 
Morimoto, 1985, Proc. Natl. Acad. Sci. USA 82: 6455-6459, the disclosures of which are incorporated 

15 herein by reference in their entireties), human hsp90 (GenBank Accession No. XI 5 183; see also 
Yamazaki et al., 1989, Nucleic Acids Res. 17: 7108, the disclosures of which are incorporated herein by 
reference in their entireties), and human gp96 (GenBank Accession No. M33716; see also Maki et al., 
1990, Proc. Natl. Acad. Sci. USA 87: 5658-5662, the disclosures of which are incorporated herein by 
reference in their entireties). 

20 The protein of SEQ ID NO:389 and the two homologs mentioned above are actually closer to 

yeast members of the family than to the human Hsp70, which makes the corresponding genes previously 
unidentified human members of the family. Both the Pfam and Prosite Hsp70 signatures (respectively the 
"HSP70" Pfam model from amino acid position 3 to 509 and the PS01036 Prosite motif from position 
332 to 346) are recognized within the protein of SEQ ID NO: 389. The protein of SEQ ID NO:389 

25 differs from the protein encoded by AF1 12210 at amino-acid positions 282, 312 and 326, and from the 
protein encoded by AF143723 at amino acid position 15 and 326. 

Heat shock proteins are a family of molecular chaperone proteins which have long been known to 
play essential roles in a multitude of intra-and intercellular processes, including protein synthesis and 
folding, vesicular trafficking, and antigen processing and presentation. Hsps are among the most highly 

30 conserved proteins known, and carry out many of their regulatory activities via protein-protein 
interactions. Historically they were identified by induction under conditions of stress, during which they 
are now known to provide an essential action of preventing aggregation and assisting refolding of 
misfolded proteins. The major stress proteins accumulate to very high levels in stressed cells but occur at 
low to moderate levels in cells that have not been stressed. 
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Hsp70 is one member of the heat shock protein family. (Milner, C. M. and Campbell, R. D. 
Immunogenetics 32: 242-251 (1990); Genbank Accession No. M59828, the disclosures of which are 
incorporated herein by reference in their entireties). The 70kD heat shock protein is a highly conserved, 
ubiquitous protein involved in chaperoning proteins to various cellular organelles. Contrary to other 
5 members of the Hsp family, it is highly inducible in mammals. Although Hsp70 is barely detectable at 
normal temperatures, it becomes one of the most actively synthesized proteins in the cell upon heat shock 
(Welch et al., 1985, J. Cell. Biol. 101:1198-1211, the disclosure of which is incorporated herein by 
reference in its entirety). In contrast, the Hsp90 and Hsp60 proteins are abundant at normal temperatures 
in most, but not all, mammalian cells and are further induced by heat (Lai et al., 1984, Mol. Cell. Biol. 

10 4:2802-10; van Bergen en Henegouwen et al, 1987, Genes Dev., 1:525-31, the disclosures of which are 
incorporated herein by reference in their entireties). Furthermore the Hsp70 proteins act as monomers 
whereas the functionally related Hsp60 proteins are associated in vivo within large double ring assemblies 
of nearly a million daltons. The various actions of the Hsps all rely basically on their ability to complex 
polypeptide segments, preferrably hydrophobic, and to stabilize them in an extended conformation in an 

15 ATP-dependent manner. The complexed polypeptides can be antigenic peptides (in which case the Hsps 
help directing them to the major histocompatibility complexes for presentation) or misfolded proteins 
which are facilitated to adopt the proper conformation by repeated cycles of binding to Hsps followed by 
release/refolding (see Bukau, B. and Horwich L., 1998, Cell 92: 351-366, the disclosure of which is 
incorporated herein by reference). 

20 On the basis of the above information, it is believed that the protein of SEQ ID NO:389 is a 

member of the human Hsp70 family. Accordingly, the protein of SEQ ID NO:389 may play a role in 
protein synthesis/folding, cellular trafficking , antigen processing, the cellular stress response and the 
immune response in immunocompetent cell types. Additional information regarding the protein of SEQ 
ID NO:389 may be obtained by performing a binding assay with a consensus Hsp70 substrate using the 

25 methods described in Rudiger et al., 1997, EMBO J. 16, 1501-1507, the disclosure of which is 
incorporated herein by reference in its entirety. 

One embodiment of the present invention relates to methods of using the protein of SEQ ID 
NO:389 or fragments comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 
consecutive amino acids thereof, or fragments having a desired biological activity as a stabilizing 

30 adjuvant to slow down protein degradation, boost the yields of recombinant proteins or regenerate 
denatured proteins. In such an embodiment, the protein of SEQ ID NO:389 of fragment thereof is mixed 
with a composition comprising the protein for which it is desired to slow down degradation, boost yield, 
or regenerate denatured proteins under conditions which facilitate the desired result. 
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For example, numerous commercial assay kits commonly used by those skilled int the arts of 
molecular biology and biochemistry depend on the biological properties of proteins (mostly enzymes) 
which can be very short-lived in vitro due to the low stability of those proteins. An example is described 
in Eur. Patent DE4 124286, the disclosure of which is incorporated herein by reference in its entirety, 
5 wherein the low intrinsic stability of test solutions used in optical tests is increased by addition of 
chaperone proteins, thus making the test more sensitive. 

The protein of SEQ ID NO:389 may also be used to increase the yield or activity of recombinant 
proteins. In recombinant DNA technology, a major unsolved problem is the solubility and biological 
activity of the recombinantly overexpressed protein in a host, especially a bacterial or yeast host. Many 

10 eukaryotic proteins, especially the secreted ones, require for correct folding a specific cellular machinery 
which is lacking in bacterial hosts such as E. coli or becomes insufficient in mammalian/yeast cells due to 
high expression of the protein. The ability of the protein of SEQ ID NO:389 or fragments thereof to 
ensure proper folding of recombinant proteins may be utilized as follows. The protein of SEQ ID 
NO:389, may be coexpressed with the recombinant protein in bacterial or eukaryotic hosts to cause the 

15 hosts to express the heterologous proteins or polypeptides in a form having increased solubility and/or 
biological activity. For example, the protein of SEQ ID NO:389 or fragments thereof may be used in the 
methods described in PCT application WO 93/25681, the disclosure of which is incorporated herein by 
reference in its entirety. Alternatively the protein of SEQ ID NO:389 or fragments thereof may be 
exogeneously added to the cell cultures as described in PCT application WO 00/08135, the disclosure of 

20 which is incorporated herein by reference in its entirety. Indeed PCT application WO 00/3 1113, the 
disclosure of which is incorporated herein by reference in its entirety, shows that when , added 
exogenously to cells, Hsp70 is readily imported into both cytoplasmic and nuclear compartments. 
Preparation and purification of the protein of SEQ ID NO: 3 89 or fragments thereof may be carried out as 
described in Patent US-6,007,821, the disclosure of which is incorporated herein by reference in its 

25 entirety. 

The protein of SEQ ID NO:389 or fragments thereof, may further be used to regenerate denatured 
proteins. Recombinantly expressed proteins with poor biologival activity are routinely denatured with a 
potent denaturing agent, such as guanidine hydrochloride, followed by refolding by dilution with a large 
amount of a diluent to reduce the concentration of the denaturing agent. However, this method often 
30 results in a poor refolding rate which may be significantly increased by addition of a cocktail of 
chaperone proteins in a fashion similar to that described for Hsp60in Eur. Patent EP0650975, the 
disclosure of which is incorporated herein by reference in its entirety. The advantage of using a cocktail 
of chaperone proteins is to accommodate differences in binding specificity of the Hsp different families 
and the different members within each family. For instance, vertebrate actin is efficiently folded by the 
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chaperonine of the eukaryotic cytosol (Gao et al., 1992, Cell 69:1043-1050, the disclosure of which is 
incorporated herein by reference in its entirety) but not at all by Hsp60 (Tian et al., 1995, Nature 375:250- 
253, the disclosure of which is incorporated herein by reference in its entirety). 

Another embodiment of the present invention relates to the use of the protein of SEQ ID NO:389 
5 or fragments thereof to deliver heterologous compounds (proteins, peptides, or DNA) to specific cellular 
compartments, preferably the cytoplasm and the nucleus. If desired, the protein of SEQ ID NO:389 or a 
fragment thereof may be fused to the heterologous compound. For example, the protein of SEQ ID 
NO:389 or fragments thereof may be used to chaperone compounds into cells using the methods 
described in PCT application WO 00/31113, the disclosure of which is incorporated herein by reference 
10 in its entirety. In the methods described in WO 00/3113, Hsp70 was used to deliver NF-KB, a key 
transcriptional regulator of inflammatory responses, into the nuclear compartment. It was shown that a 
fusion protein composed of a Cterminal Hsp70 peptide and amino acids 37-409 of the p50 subunit of NF- 
KB was directed into the nucleus of cells, could bind DNA specifically, and activated kappa Ig expression 
and TNFa production. 

15 In one embodiment of the present invention, the protein of SEQ ID NO:389 or a fragment thereof 

may be used in human therapy as a modulator of immune response. Disease states which may be treated 
by Hsp70, fragments thereof, and/or Hsp70 complexes of the present invention include transplant 
rejection (see US5,89 1,653, the disclosure of which is incorporated herein by reference in its entirety) and 
autoimmune diseases, such as insulin dependent diabetes mellitus, rheumatoid arthritis, multiple sclerosis, 

20 juvenile diabetes, asthma, and inflammatory bowel disease, as well as inflammatory diseases, cancer, 
viral replication diseases and vascular diseases as described in the following patents, each of which is 
incorporated herein by reference in its entirety: US6,007,821; WO 00/31113; WO 99/18801 (treatment of 
auto-immune diseases), US6,017,540; US6,017,544; AU3425899; WO 99/54464; US5,837,251; 
US5,830,464; WO 98/34642; WO 98/34641; US5,750,119; WO 97/10001; WO 96/10411 (cancer 

25 treatment); DEI 98 13 760, DEI 98 13759 (both autoimmune disease and cancer). 

The protein of SEQ ID NO:389 or fragments thereof may also be used to treat or ameliorate 
autoimmune disease. In this embodiment, compositions of complexes of heat shock/stress proteins 
(including, but not limited to the protein of SEQ ID NO: 3 89) are administered to an individual suffering 
from an autoimmune disease. The complexes may be comprised of the protein of SEQ ID NO:389 or 

30 fragments thereof alone or may include other heat shock/stress proteins. In one embodiment, the protein 
of SEQ ED NO: 3 89 or a fragment thereof is bound noncovalently to antigenic molecules and administered 
to individuals suffering from autoimmune disease to suppress the autoimmune response. Alternatively, 
compositions comprising the protein of SEQ ED NO:389 or fragments thereof in an un-complexed form 
(i.e., free of antigenic molecules) may also be administered to an individual suffering from autoimmune 
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disease to suppress the immune response (see Patent US6,007,821, the disclosure of which is incorporated 
herein by reference in its entirety). 

The ability of stress proteins to chaperone the antigenic peptides of the cells from which they are 
derived allows them to be used to isolate the antigenic peptides expressed in a tumor. In this 
5 embodiment of the present invention, complexes comprising the protein of SEQ ID NO:389 or fragments 
thereof and an antigenic peptide expressed by the tumor are isolated. The isolated complexes are 
administered back to the individual from which they were obtained in order to elicit an immune response 
against the tumor. Accordingly, this approach circumvents the necessity of isolating and characterizing 
specific tumor antigens and enables the skilled artisan to readily prepare immunogenic compositions 

10 effective against a tumor in an individual (see Patent US6,0 17,544, the disclosure of which is 
incorporated herein by reference in its entirety). 

The protein of SEQ ID NO: 3 89 may also be used to diagnose bladder cancer. The segment of the 
protein of SEQ ID NO:389 extending between amino acid positions 1 through 187 is more than 99% 
identical to a polypeptide which is linked to bladder cancer. (See Eur. Patent DEI 98 18620, the disclosure 

15 of which is incorporated herein by reference in its entirety). The 187 amino-acid long polypeptide 
described in DEI 98 18620 was identified as the partial product of the only gene for which expression was 
significantly altered in a bladder tumour compared to a healthy bladder. In another embodiment of the 
present invention, the protein of SEQ ID NO:389 or a fragment thereof thereof may be used to diagnose 
disorders associated with altered intercellular communication or secretion. In such techniques, the level 

20 of the protein of SEQ ID NO:389 in an individual is measured using techniques such as those described 
herein. The level of the protein of SEQ ID NO: 3 89 in the individual is compared to the level in normal 
individuals. An altered level of the protein of SEQ ID NO: 3 89 relative to normal individuals suggests 
that the individual is suffering from bladder cancer. The level of the protein of SEQ ID NO:389 present 
in the individual may determined by contacting a sample from the individual with an antibody directed 

25 against the polypeptide of SEQ ID NO:389 . Alternatively, the level of the protein of SEQ ID NO:389 in 
the individual may be measured by determining the level of RNA encoding the protein of SEQ ID 
NO:389 in the sample. RNA levels may be measured using nucleic acid arrays or using techniques such 
as in situ hybridization, Northern blots, dot blots or other technques familiar to those skilled in the art. If 
desired, an amplification reaction, such as a PCR reaction, may be performed on the nucleic acid sample 

30 prior to analysis. The level of RNA in the sample is compared to RNA levels in normal individuals to 
determine whether the individual is suffering from bladder cancer. 

Antibodies against the protein of the protein of SEQ ID NO:389 or nucleic acid probes 
complementary to the sequence encoding the protein of SEQ ID NO:389 may also be used as a prognosis 
of tumor recurrence in breast as described in Patent US Patent No.: 5,188,964, the disclosure of which is 
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incorporated herein by reference in its entirety. As described in U.S. Patent No. 5,188,964, specific levels 
of the stress response proteins (including Hsp70) were identified, above which the probability of tumor 
recurrence is highly signficant. Accordingly, the levels of the protein of SEQ ED NO:389 or RNA 
encoding the protein of SEQ ID NO:389 may be determined from in a sample from an individual who has 
5 experienced a breast tumor in the past. Protein or RNA levels may be measured as described herein. If 
the protein or RNA levels exceed the levels above which tumor occurrence is likely, an appropriate 
course of treatment may be initiated. 

In another embodiment of the present invention, the protein of SEQ ID NO:389 may be used to 
promote tissue repair and/or increase cell survival in stress conditions such as hypoxy, oxidative stress, 

10 genotoxic agents and more generally harmful conditions leading to programmed cell death. The beneficial 
effect is produced either by protecting the cell proteins from premature denaturation/degradation or by 
directly inhibiting a signal transduction pathway leading to programmed cell death (Gabai VL. et al., 
1998, FEBS Lett. 438:1-4, the disclosure of which is incorporated herein by reference in its entirety). 
Those conditions include but are not limited to infarction, heart surgery, stroke, neurodegenerative 

15 diseases, epilepsy, trauma, atherosclerosis, restenosis after angioplasty, and nerve damage. For example, 
it is known that hypoxic stress is a signal that increases the amount of Hsp70 in cardiac tissue, whereupon 
Hsp70 helps cells survive by binding to partially denatured proteins and assisting in the refolding of these 
proteins into more stable native structures. Such assistance would be extremely important in providing 
protection to the heart during periods of hypoxia such as during an infarct or during surgery when blood 

20 flow to the heart may be temporarily halted. Several groups have also shown that overproduction of 
Hsp70 leads to protection in several different models of nervous system injury (reviewed in Midori AY et 
al., 1999, Mol. Med. Today, 5:525-31, the disclosure of which is incorporated herein by reference in its 
entirety). Therapeutic methods for administering the protein of SEQ ID NO:389 or a fragment thereof 
include but are not limited to those disclosed in Patent WO 00/23093, the disclosure of which is 

25 incorporated herein by reference in its entirety. 

Accordingly, it may be desirable to increase or decrease the level of the protein of SEQ ID 
NO:389 in an individual having a condition resulting from an increased or decreased level of the protein. 
In such embodiments, the protein of SEQ ID NO:389 , or a fragment thereof, is administered to an 
individual in whom it is desired to increase or decrease any of the foregoing activities. The protein of 

30 SEQ ID NO:389 or fragment thereof may be administered directly to the individual or, alternatively, a 
nucleic acid encoding the protein, of SEQ ID NO:389 or a fragment thereof may be administered to the 
individual. Alternatively, an agent which increases the activity of the protein of SEQ ID NO:389 may be 
administered to the individual. Such agents may be identified by contacting the protein of SEQ ID 
NO:389 or a cell or preparation containing the protein of SEQ ID NO:389 with a test agent and assaying 
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whether the test agent increases the activity of the protein. For example, the test agent may be a chemical 
compound or a polypeptide or peptide. 

Alternatively, the activity of the protein of SEQ ID NO:389 may be decreased by administering 
an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
5 the protein of SEQ ID NO:389 may be identified by contacting the protein of SEQ ID NO:389 or a cell 
or preparation containing the protein of SEQ ID NO:389 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 
forming nucleic acid. 

10 Protein of SEP ID NO:250 (internal designation 105-053-4-0-E8-CS) 

The protein of SEQ ID NO:250 is encoded by the cDNA of SEQ ID NO:9. It will be appreciated 
that all characteristics and uses of the polypeptide of SEQ ID NO:250 described throughout the present 
application also pertain to the polypeptide encoded by the human cDNA of clone 105-053-4-0-E8-CS. In 
addition, it will be appreciated that all characteristics and uses of the nucleic acid of SEQ ID NO:9 

15 described throughout the present application also pertain to the human cDNA of clone 105-053-4-0-E8- 
CS. The protein of SEQ ID NO:250 is found in prostate and exhibits extensive homologies to stretches 
of pancreatic zymogen granule membrane protein GP2 (Glycoprotein-2). In particular, the protein of 
SEQ ID NO:250 exhibits homologies to the GP2 proteins of human (SWISS-PROT accession number 
P55259, the disclosure of which is incorporated herein by reference in its entirety), rat (SWISS-PROT 

20 accession number PI 92 18, the disclosure of which is incorporated herein by reference in its entirety) and 
dogs (SWISS-PROT accession number P25291, the disclosure of which is incorporated herein by 
reference in its entirety). In fact, the amino acid sequence of SEQ ID NO:250 is completely identical to 
those of human GP2 sequences except that the protein of SEQ ID NO:250 is missing amino acids 62 to 
484 from the human GP2 sequence. The protein of SEQ ID NO:250 contains two hydrophobic regions, 

25 namely the N-terminal signal peptide (amino acid residues 8-28) and the C-terminal transmembrane 
domain (amino acid residues 91-1 1 1). 

GP2 (Glycoprotein-2) is the major membrane glycoprotein of secretory zymogen granule (ZG) 
membranes within pancreatic acinar cells (Fukuoka et al. 1990 Nuc. Acids Res., 18:5900; Fukuoka et al. 
1991 Proc. Natl. Acad. Sci., USA, 88:2898-2902; Fukuoka et al. 1992 Proc. Natl. Acad. Sci. USA, 

30 89:1189-1193; Freedman, et al. 1993 Eur. J. Cell Biol. 61:229-238; Scheele et al. 1993 Pancreas :139- 
149; Freedman et al. 1994 Annals N.Y. Acad. Sci. 713:199-206, the disclosures of which are incorporated 
herein by reference in their entireties). GP2 homologues are also widely distributed among diverse 
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epithelial tissues known to possess regulated secretory processes, including parotid, submandibular gland, 
stomach, liver and lung (Fukuoka et al. 1992 Proc. Natl. Acad. Sci. USA, 89:1 189-1 193). 

In addition to ZG membranes, GP2 is also located in pancreatic acinar cells in rough endoplasmic 
reticulum, Golgi, trans-Golgi components, condensing vacuoles, apical plasma membranes (APM), 
5 basolateral plasma membranes (BPM), and within ZGs and acinar lumina (Scheele et al., 1994 Pancreas 
9:139-149). GP2 is linked to the membrane of the ZG via a glycosylphosphatidyl inositol-anchor (GPI- 
anchor) (Fukuoka et al. 1991 Proc. Natl. Acad. Sci. USA, 88:2898-2902; Lebel and Beattie 1988 
Biochem. Biophys. Res. Comm. 254:1189-93, the disclosures of which are incorporated herein by 
reference in their entireties) and forms complexes, usually tetrameric complexes, below a pH of about 6.5. 

10 During assembly of secretory granules within the trans-Golgi network (TGN), the low pH of the 

TGN causes formation of GP2 complexes. These complexes bind to proteoglycans (PG), forming a 
fibrillar GP2/PG meshwork on the lumenal surface of the ZG. The GP2/PG matrix may function in 
membrane sorting within the TGN, assembly of ZG membranes, inactivation of ZG membranes during 
granule storage, and regulation of ZG membrane trafficking at the apical plasma membrane. The GP2/PG 

15 matrix may also protect the lumenal aspect of the granule membrane from contact with secretory enzymes 
contained within the granules and facilitate the specific release of secretory enzymes during exocytosis at 
the apical plasma membrane. 

The enzymes and the acidic milieu contained in the ZG are released into the lumen of the 
pancreas through exocytosis by acinar cells. The pH at the apical plasma membrane of the acinar cells, 

20 and of the pancreatic lumen in general, is maintained at an essentially neutral or alkaline pH by the fluid 
and bicarbonate secreted by pancreatic ductal cells. The increased pH at the apical plasma membrane 
(relative to the acidic pH within the ZG) optimizes the conditions for enzymatic cleavage of the GPI 
anchor of GP2, resulting in release of GP2 and GP2/PG complexes from the apical membrane. (Scheele et 
al. (1994) Pancreas 9:139-149, the disclosure of which is incorporated herein by reference in its entirety). 

25 The form of GP2 produced by GPI-anchor cleavage is termed globular GP2 (gGP2). 

It is believed that the protein of SEQ ID NO:250 is a GP2 protein, and is thus likely involved in 
regulated membrane trafficking along apical secretory processes in a variety of epithelial cells. 

Accordingly, the present invention includes the use of the protein of SEQ ID NO:250, fragments 
comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 

30 acids thereof, or fragments having a desired biological activity in the modulation of membrane sorting 
within the trans-Golgi network, assembly of zymogen granule membranes, inactivation of zymgogen 
granule membranes during granule storage, regulation of zymogen granule membrane trafficking at the 
apical plasma membrane, release of secretory enzymes during exocytosis at the apical plasma membrane. 
In such embodiments, the protein of SEQ ID NO:250, or a fragment thereof, is administered to an 
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individual in whom it is desired to increase or decrease any of the foregoing activities. The protein of 
SEQ ID NO:250 or fragment thereof may be administered directly to the individual or, alternatively, a 
nucleic acid encoding the protein of SEQ ID NO:250 or a fragment thereof may be administered to the 
individual. Alternatively, an agent which increases the activity of the protein of SEQ ID NO:250 may be 
5 administered to the individual. Such agents may be identified by contacting the protein of SEQ ID 
NO:250 or a cell or preparation containing the protein of SEQ ID NO:250 with a test agent and assaying 
whether the test agent increases the activity of the protein. For example, the test agent may be a chemical 
compound or a polypeptide or peptide. 

Alternatively, the activity of the protein of SEQ ID NO:250 may be decreased by administering 

10 an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:250 may be identified by contacting the protein of SEQ ID NO:250 or a cell or 
preparation containing the protein of SEQ ID NO:250 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 

1 5 forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify tissues, preferably pancreas and 
% prostate, or to distinguish between two or more possible sources of a tissue sample on the basis of the 
level of the protein of SEQ ID NO:250 in the sample. For example, the protein of SEQ ID NO:250 or 

20 fragments thereof may be used to generate antibodies using any techniques known to those skilled in the 
art, including those described therein. Such tissue-specific antibodies may then be used to identify tissues 
of unknown origin, for example, forensic samples,differentiated tumor tissue that has metastasized to 
foreign bodily sites, or to differentiate different tissue types in a tissue cross-section using 
immunochemistry. In such methods a tissue sample is contacted with the antibody, which may be 

25 detectably labeled, under conditions which facilitate antibody binding. The level of antibody binding to 
the test sample is measured and compared to the level of binding to control cells from pancreas or prostate 
or tissues other than pancrease or prostate to determine whether the test sample is from pancreas or 
prostate. Alternatively, the level of the protein of SEQ ID NO:250 in a test sample may be measured by 
determining the level of RNA encoding the protein of SEQ ID NO:250 in the test sample. RNA levels 

30 may be measured using nucleic acid arrays or using techniques such as in situ hybridization, Northern 
blots, dot blots or other technques familiar to those skilled in the art. If desired, an amplification reaction, 
such as a PCR reaction, may be performed on the nucleic acid sample prior to analysis. The level of RNA 
in the test sample is compared to RNA levels in control cells from pancreas or prostate or tissues other 
than pancreas or prostate to determine whether the test sample is from pancreas or prostate. 
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In another embodiment, antibodies to the protein of the invention or part thereof may be used for 
detection, enrichment, or purification of membranes or zymogen granules using any techniques known to 
those skilled in the art. For example, an antibody against the protein of SEQ ID NO:250 or a fragment 
thereof may be fixed to a solid support, such as a chromatograpy matrix. A preparation containing 
5 membranes or zymogen granules is placed in contact with the antibody under conditions which facilitate 
binding to the antibody. The support is washed and then the membranes or zymogen granules are 
released from the support by contacting the support with agents which cause the membranes or zymogen 
granules to dissociate from the antibody. 

In another embodiment of the present invention, the protein of SEQ ID NO:250 or a fragment 

10 thereof thereof may be used to diagnose disorders associated with altered intercellular communication or 
secretion. In such techniques, the level of the protein of SEQ ID NO:250 in a patient is measured using 
techniques such as those described herein. The level of the protein of SEQ ID NO:250 in the patient is 
compared to the level in control individuals. An elevated level or decreased level of the protein of SEQ 
ID NO:250 relative to control individuals suggests that the patient is suffering from a defect in 

1 5 intercellular communication or secretion. 

In another embodiment, the protein of SEQ ID NO:250 or a fragment thereof is used to facilitate 
or decrease exocytosis. For example, the protein of SEQ ID NO:250 or fragment thereof may be used to 
increase or decrease the release of secretory enzymes within pancreatic acinar cells or prostatic cells. 
Accordingly, the protein of the invention or part thereof may be used to diagnose, treat and/or prevent 

20 disorders associated with abnormal membrane trafficking including but not limited to viral or other 
infections, traumatic tissue damage, and hereditary diseases such as pancreatitis or prostatitis, invasive 
carcinomas and lymphomas. In such methods, the protein of SEQ ID NO:250, a fragment of the protein 
of SEQ ID NO:250 , or an agent which increases or decreases the activity of the protein of SEQ ID 
NO:250 is administered to an individual using techniques such as those described herein. 

25 In another embodiment, the invention relates to methods of using the protein of SEQ ID NO:250 

or a fragment thereof in the diagnosis of pancreatitis or prostatitis by detecting an elevation in the level of 
the protein of SEQ ID NO:250, in a sample of bodily fluid, such as human blood, serum, or urine. The 
protein may be detected using any method known to those skilled in the art, including those described 
herein. In some embodiments, the protein of SEQ ID NO:250 or fragment thereof may be detected using 

30 the methods described in U.S. Patent Nos. 5436169 or 5663315, the disclosures of which are incorporated 
herein by reference in their entireties. 
References : 

U.S. Patent Nos. 5,436,169; 5,663,315 
Nucleic Acids Research 18(9):5900, (1990) 
35 Proc. Natl. Acad. Sci. USA 88(7):2898-2902 (1991) 
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Proc. Natl. Acad. Sci. USA 89:1 189-1 193 (1992) 
Eur. J. Cell Biol. 61:229-238 (1993 

Freedman et al., Annals N.Y. Acad. Sci. 713:199-206, 1994. 
Scheele et al. 5 Pancreas 9(2): 139-149, 1994. 

5 Protein of Seq Id No: 274 (internal designation: 145-56-3-0-D5-CS) 

The protein of SEQ ID No: 274 encoded by the cDNA of SEQ ID No: 33 is homologous to the 
human RNA 3 '-terminal phosphate cyclase-like protein 1 (Rcll) (trEMBL accession number CAB8981 1) 
which is abundant in the nucleolus. 

The RNA 3' -terminal phosphate cyclase, an enzyme originally identified in extracts from human 

10 HeLa cells and Xenopus oocyte nuclei, catalyzes the ATP-dependent conversion of the 3'-terminal 
phosphate group into a 2\3'-cyclic phosphodiester at the 3'-end of RNA, resulting in the activation of the 
3' end of RNA molecules. Database searches showed that genes encoding proteins similar to human and 
E.coli human RNA 3 '-terminal phosphate cyclase are conserved among eukarya, bacteria and archaea, 
arguing for an essential function of the enzyme in RNA metabolism (Genschik P. et al. - EMBO J - 

15 1998, 16, p.2955-2967). Similarly analysis of the human RNA 3'-terminal phosphate cyclases and 
related proteins from other organisms, indicated that they can be divided into 2 subfamilies referred to as 
RNA 3 '-terminal phosphate cyclases (Rtc) and RNA 3 '-terminal phosphate cyclase-like protein (Rcl). 
These 2 subfamilies share several sequence elements, including a nearly universally conserved amino acid 
sequence RGxxPxGGGx@ (where x stands for any, and @ for hydrophobic amino acids), designated 

20 originally as the cyclase signature, which corresponds to the Prosite signature, although structurally 
slightly different these 2 subfamilies of proteins have the same function and are involved in RNA 
metabolism. The cyclase signature is present in the protein of the invention (positions 157 to 167). In 
addition, this protein also displays othere characteristic signatures of RNA 3'-terminal phosphate cyclase 
proteins (pfam signature from positions 1 to 368 and eMotif signatures from positions 12 to 44 and from 

25 positions 157 to 168). 

3 '-terminal phosphate cyclases (Rtc and Rcl) catalyze the conversion of 3 '-terminal phosphate to 
a 2' 5 3'-cyclic phosphodiester in a reaction dependent on ATP, other nucleoside triphosphates being much 
less active co-factors. With both enzymes, the cyclization of the 3 '-phosphate at the 3 '-end of RNA 
occurs by a three-step mechanism as follows : (a) adenylation of the enzyme by ATP; (b) the enzyme acts 

30 on RNA-N3* P to produce RNA-N3'PP5 , A; (c) a non catalytic nucleophilic attack by the adjacent 
2 f hydroxyl on the phosphorus in the diester linkage to produce the cyclic end product. 

RNA 3 '-terminal phosphate cyclase proteins are involved in RNA processing. It has been 
demonstrated that several eukaryotic and prokaryotic RNA ligases require 2',3'-cylic phosphate RNA 
ends which suggests that the enzyme is involved in generation or maintenance of cyclic termini in RNA 

35 ligation substrates, known to be required by several RNA ligases in both eukaryotes and prokaryotes. 
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These ligases include 2 tRNA-splicing ligases, and the prokaryotic RNA ligase of unknown function that 
joins RNA ends via atypical 2',5'-phospodiester (Arn E. et al. - RNA structure and Function - Cold 
spring Harbor Laboratory Press - 1998 p.695-726). The involvement of these ligases in nuclear pre- 
tRNA splicing is well documented (Zillmann et al. - -Mol Cell Biol - 1991, 11, p5410-5416)(Phizicky E. 
5 et al. - J Biol Chem, 1992, 267, p4577-4582) but these enzymes might also function in the ligation of 
virusoids and viroids (Branch A. et al. - Science - 1982, 217, pi 147-1 149) (Kibertis et al. - EMBO J - 
1985,4, p817-827) 

Alternatively, the cyclase could be responsible for producing cyclic phosphate 3 '-ends identified 
in the spliceosomal U6 small nuclear RNA and some other small RNAs. Furthermore in yeast Rcl is 

10 associated to U3 small nucleolar RNP (U3 snoRNP) a central component of the 18S ribosomal RNA 
(rRNA) processing machinery in yeast and vertebrates (Billy E. et al. - EMBO J - 2000, 19, p21 15- 
2126). However it seems that Rcl are not a structural component of U3 snoRNP and its association with 
U3 snoRNP occurs, most probably, in large macromolecular complexes representing nascent ribosmes. 
In yeast, depletion or inactivation of Rcl causes a defect in 18S mRNA synthesis, which leads to a 

15 decreased levels of 40S ribosomal sub-units, resulting in an accumulation of free 60S ribosomes and a fall 
in the amount of polysomes. In Yeast 18S, 5.8S and 25S rRNAs are derived from a long 35S precursor. 
This 35S pre-rRNA is normally cleaved at the AO site, yielding 33S pre-rRNA. 33S rRNA is then 
processed rapidly at sites Al and A2 to generate 20S pre-rRNA, which is further processed into mature 
18S rRNA. Deletion or inactivation of Rcl leads to inhibition of processing at sites AO, Al and A2 (Billy 

20 E. et al. - EMBO J - 2000, 19, p21 15-2126). 

It is believed that the protein of SEQ ID No: 274 or part thereof is involved in RNA processing, 
probably as a RNA 3 '-terminal phosphate cyclase. Preferred polypeptides of the invention are 
polypeptides comprising amino acids 157 to 167, 1 to 368, 12 to 44 and 157 to 168. Other preferred 
polypeptides of the invention are fragments of SEQ ID No: 274 having any of the biological activities 

25 described herein. Assays of cyclase activity can be carried out using the Norit method as described in the 
article by Filipowicz (Filipowicz W et al.- Methods Enzymol. - 1990, 181, p.499-510), which disclosure 
is hereby incorporated by reference in its entity, or any other techniques known to those skilled in the art. 

Thus, an embodiment of the present invention relates to compositions and methods of using the 
protein of the invention or part thereof in in vitro RNA manipulation to isolate small nucleolar RNPs 

30 especially, but not limited to U3 snoRNP from biological samples, using immunoprecipitation techniques 
(Billy E. et al. - EMBO J - 2000, 19, p21 15-2126), which disclosure is hereby incorporated by reference 
in its entity, or any other techniques known to those skilled in the art. 

In another embodiment, the protein of the invention or part thereof is used to develop antagonists 
of the protein of the invention or part thereof in order to inhibit or decrease cellular proliferation. This 
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can be explained by the fact that protein of the invention or part thereof is probably involved in rRNA 
maturation, thus the use of products that inhibit rRNA maturation prevents the formation of functional 
ribosomes, which leads to an inhibition of protein synthesis. Cells that are unable to synthesis proteins 
stop to grow and ultimately die due to the fact that they are unable to regenerate proteins. One preferred 
5 embodiment of the invention pertains to the use of the protein of the invention or part thereof to develop 
these antagonists, which are added to samples or materials as a "cocktail" in association with other 
antimicrobial substances to stop and/or prevent proliferation of undesired contaminants. For example the 
protein of the invention or part thereof may be used to inhibit the proliferation of undesired bacteria and 
or viruses in in vitro cultures. In another preferred embodiment of the invention the protein or part thereof 

10 could be used to develop antagonists that could be administered to patients suffering from viral and or 
bacterial infection particularly viral infections by viruses such as HIV and HCV. This could for example 
be accomplished by targeting the antagonists to cells infected by the virus or directly to bacteria. Once 
inside these cells the antagonist will inhibit or at least decrease protein synthesis resulting in an inhibition 
or a decrease in bacterial and/or viral replication. In yet another preferred embodiment of the invention 

15 the protein or part thereof could be used to develop antagonists that could be administered to patients in 
order to inhibit abnormal and/or unregulated cellular proliferation found in diseases such as cancers, 
psoriasis, Systemic lupus erythematosus (SLE), arthritis, endometriosis, enteropathy in immunodeficiency 
virus infection, venous eczema (inducing connective tissue sclerosis in lipodermatosclerosis and causing 
the reduced reepithelialization tendency in venous ulcers), chronic irritant contact dermatitis (CICD), 

20 adult polycystic kidney disease (APKD), ichthyosis, cholesteatoma. 

Protein of SEP ID NOs:303 (internal designation number 187-31-0-0-F12-CS) and 275 (internal 
designation number 145-59-2 -0-A7-CS) 

The 148-amino-acid long protein of SEQ ID NO:303, encoded by the cDNA of SEQ ID NO:62, 
found in fetal kidney and highly expressed in this organ, is homologous to the human RNA-associated 

25 protein HSCP250 (SPTREMBLNEW SPTREMBL SWISSPROT accession number AAF36170 and 
GENESEQP accession number Y84433). In addition, this protein displays significant homology to the 
ribosomal L27 protein of D. melanogaster (GENPEPT GENPEPTNEW accession number AE003576) 
and to the 50S ribosomal L27 protein of E.coli (SWISSPROT accession number P02427). The protein of 
SEQ ID NO:303 has a putative signal peptide, from amino acid position 13 to 27. According to the 

30 PFAM program, the protein of the invention also presents a ribosomal L27 protein signature in position 
31 to 81. Amino acid residues in position 64 to 78 are highly similar to the consensus pattern: G-X- 
[LIVM](2)-X-R-Q-R-G-X(5)-G, where X is any amino acid (the motif found in the protein of SEQ ID 
NO:303 is G-X-I-I-X-T-Q-R-H-X(5)-G). Potential phosphorylation sites exist in positions 32, 38, 47 (S 
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amino residues), 60 (Y amino residue), 69 and 141 (T amino residues). One of them, the T residue in 
position 69, is embedded in the ribosomal L27 protein signature described above. 

The protein of SEQ ID NO:275, encoded by the cDNA of SEQ ID NO:34, is a 94-amino-acid 
long variant of the SEQ ED NO:303 protein. While the first 81 amino acid residues of protein of SEQ ID 
5 NO:275 are strictly homologous to the first 81 amino acid residues of protein of SEQ ID NO:303, the 13 
subsequent amino-acids are different. In addition to the putative signal peptide (position 13 to 27), the 
ribosomal L27 protein signature (position 64 to 78), and phosphorylation sites (positions 32, 38, 47, 60 
and 69), the protein of SEQ ID NO:275 also displays a candidate membrane-spanning segment in position 
74 to 94. 

10 Ribosomal protein L27 is one of the proteins of the large ribosomal subunit. L27 belongs to a 

family of ribosomal proteins which, on the basis of sequence similarities, includes: eubacterial L27, plant 
chloroplast L27 (nuclear-encoded), algal chloroplast L27 and yeast mitochondrial YmL2 (gene MRPL2 
or MRP7). Among the different ribosomal L27 proteins characterized so far, the one of E.coli is probably 
the best studied. Protein L27 is one of the smallest and the most basic polypeptides in E.coli ribosome. 

15 Techniques like the measurement of protein exposure by hot tritium bombardment have shown that L27 
of the large subunit is well exposed on the surface of the E.coli 70S ribosome (Agafonov et al., Proc. 
Natl. Acad. Sci. 94:12892-12897 (1997)). Chemical and UV-crosslinking studies have demonstrated that 
L27 is closely associated with domain V of the 23 S rRNA, a region that comprises part of the peptidyl 
transferase center (Osswald et al. 9 Nucleic Acids Res. 18:6755-6760 (1990)). Direct evidence for the 

20 presence of L27 at the peptidyl transferase center was obtained through the use of derivatives of tRNA Phe 
containing photoreactive azidonucleotides within the 3 '-terminal ACCAqh sequence (Wower et al., Proc. 
Natl. Acad. Sci. 86:5232-5236 (1989)). Analysis of a mutant E.coli strain in which the rpmA gene, which 
encodes L27, was replaced by a Kanamycin marker, has suggested that L27 contributes to peptide bond 
formation by facilitating the proper placement of the acceptor end of the A-site tRNA at the peptidyl 

25 transferase center (Wower et al., J. Biol. Chem. 273:19847-19852 (1998)). Further, recent studies 
conducted by Thiede and collaborators have precisely determined RNA-protein contact sites in the 50S 
ribosomal subunit of E.coli (Thiede et al., Biochem. J. 334:39-42 (1998)), showing that Lys-71 and Lys- 
74 of L27 interact with U-2334 of the 23S rRNA. 

It is believed that the proteins of SEQ ID NOs:303 and 275 are human RNA-associated proteins. 

30 Preferred polypeptides of SEQ ID NO:303 are polypeptides comprising the amino acids from positions 13 
to 27, 64 to 78 and amino acid residues in positions 32, 38, 47, 60, 69 and 141. It is believed that the 
protein of SEQ ID NO:275 is a 94 amino acid long variant of the 148 amino acid residues protein of SEQ 
ID NO:303. Preferred polypeptides of SEQ ID NO:275 are polypeptides comprising the amino acids 
from positions 13 to 27, 64 to 78, 74 to 94 and amino acid residues in positions 32, 38, 47, 60, and 69. 
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Other preferred polypeptides of the invention are fragments of SEQ ED NO:303 or 275 having any of the 
biological activities described herein. 

One embodiment of the present invention involves the use of the present proteins and nucleic 
acids to specifically identify cells from the kidney, especially from the fetal kidney. Such cells can be 
5 detected by virtue of their strong expression of the protein of the invention, and can thus be detected using 
any standard method for detecting protein expression or activity, including methods involving antibodies, 
specific nucleic acids, or any other detectable molecule that specifically binds to the polypeptides or 
polynucleotides of the invention. An ability to specifically detect kidney cells is useful, e.g. for 
determining the identity of tumor cells as well as for the identification of specific cell types and tissues 

10 for, e.g. histological analyses. 

In another embodiment of the present invention, the present proteins are used as a component of 
in vitro eukaryotic translation systems. Such systems represent a widely used tool for protein production 
with many academic and industrial applications. Similarly, inhibitors of the protein of the invention, e.g. 
antibodies or dominant negative forms of the protein, can be used to inhibit in vitro translation systems, 

15 e.g. to specifically stop a translation reaction involving a eukaryotic cell extract. 

In another embodiment, the proteins of SEQ ID NO: 303 or 275 can be used to bind to nucleic 
acids, preferably RNA, alone or in combination with other substances. For example, the proteins of the 
invention or part thereof can be added to a sample containing RNAs in optimum conditions for binding, 
and allowed to bind to RNAs. In a preferred such embodiment, the proteins of the invention or part 

20 thereof may be used to purify mRNAs, for example to specifically isolate RNA, e.g. from a specific cell 
type or from cells grown under particular conditions. Such RNAs could then be reverse transcribed and 
cloned, could be analyzed for relative expression analyses, etc. In addition, such methods may be used to 
specifically remove RNA from a sample, for example during the purification of DNA. To carry out any 
of these methods, the proteins of the invention or part thereof may be bound to a chromatographic 

25 support, either alone or in combination with other RNA binding proteins, to form an affinity 
chromatography column. A sample containing a mixture of nucleic acids to purify is then run through the 
column. Immobilizing the proteins of the invention or part thereof on a support is particularly 
advantageous for embodiments in which the method is to be practiced on a commercial scale. This 
immobilization facilitates the removal of RNAs from the batch of resin-coupled protein after binding, and 

30 allows subsequent re-use of the protein. Immobilization of the proteins of the invention or part thereof 
can be accomplished, for example, by inserting any matrix binding domain in the protein according to 
methods known to those skilled in the art. The resulting fusion product including the proteins of the 
invention or part thereof is then covalently, or by any other means, bound to a protein, carbohydrate or 
matrix (such as gold, "Sephadex" particles, polymeric surfaces). 
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Still another embodiment of the invention relates to methods of preparing antibodies directed 
against the proteins of the invention or part thereof. Such antibodies may be used, e.g., in co- 
immunoprecipitation experiments to separate and purify RNAs associated with the proteins of the 
invention. To accomplish this, in a sample containing a mixture of nucleic acids, antibodies directed 
5 against the protein of the invention may be added in association with protein A or protein G sepharose 
beads. Immunoprecipitation conditions are well known to those skilled in the art. 

The invention further relates to methods and compositions used to modify the proteins of the 
invention. In a preferred embodiment, K amino-acids of the proteins of the invention are substituted for 
other basic amino-residues (R residues), as some of these K residues seem to be crucial for RNA 

10 interactions (Thiede et al., Biochem. J. 334:39-42 (1998)). Conversely, R residues of the proteins of the 
invention may be substituted for K residues. These substitutions are predicted to change the specificity 
and/or the affinity of the proteins of the invention for RNA molecules. Another preferred embodiment 
may be to perform post-translational modifications of the proteins of the invention, notably at the level of 
the putative phosphorylation sites described above in positions 32, 38, 47, 60, 69 and 141 of SEQ ID 

15 NO:303. By adding negative charges to the proteins of the invention, these phosphorylation sites may 
modulate the affinity of the protein for RNA molecules. Phosphorylation of T residue in position 69 is of 
great interest, as it is embedded in the ribosomal L27 protein signature. 

In another preferred embodiment, the proteins of the invention or part thereof may be used to 
visualize RNAs, when the polypeptides are linked to an appropriate fusion partner, or is detected by 

20 probing with an antibody. 

Another embodiment of the present invention relates to methods and compositions using the 
proteins of the invention, or part thereof, to associate specific mRNAs to the inner face of lipidic bilayers 
of liposomes in order to further introduce these mRNAs into the cytoplasm of eukaryotic cells. For 
example, as described above, the protein of the invention of SEQ ID NO:275 displays both a candidate 

25 membrane-spanning segment in position 74 to 94 (at its very carboxy-terminal part), and a ribosomal L27 
protein signature in position 64 to 78. Moreover SEQ ID NO:275 is an RNA binding protein. Preferably, 
specific mRNAs are first associated with the protein of the invention and the RNA/protein complex 
formed in that way is then mixed with liposomes according to methods known to those skilled in the art. 
These liposomes are added to an in vitro culture of eukaryotic cells. In vivo, such a method might treat 

30 and/or prevent disorders linked to dysregulation of gene transcription such as cancer and other disorders 
relating to abnormal cellular differentiation, proliferation, or degeneration. 

In another embodiment, the present proteins and nucleic acids can be used to modulate the rate of 
cell growth in vitro or in vivo. Studies in Drosophila have shown that a decrease in ribosome function 
results in a significant inhibition of cell growth. Accordingly, compounds that inhibits the expression or 
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function of the proteins of the invention can be used to inhibit the growth rate of cells, and can thus be 
used, e.g. in the treatment or prevention of diseases or conditions associated with excessive cell growth, 
such as cancer or inflammatory conditions. Such compounds include, but are not limited to, antibodies, 
antisense molecules, dominant negative forms of the proteins, and any heterologous compounds that 
5 inhibit the expression or the activity of the proteins. 

Protein of SEP ID NO:269 (internal designation 116-1 15-2-0-F8-CS) 

The protein of SEQ ID N0.269, encoded by the cDNA SEQ ID NO:28, shows homology with the 
mink whale ribonuclease A (Emmens M., et al., Biochem. J. 157:317-323(1976)) a member of the 
10 pancreatic ribonuclease family. In addition, the protein of the invention exhibits 2 membrane spanning 
segments, the first from amino acid positions 1-21, the second from amino acid positions 179-199. The 
cDNA SEQ ID NO:28 is composed of 3 exons. Exon 1 is encoded by nucleotides 1-225, exon 2 by 
nucleotides 226-288, and exon 3 by nucleotides 289-597. The protein of the invention is highly 
expressed in the testis. 

15 Ribonucleases are proteins that catalyze the hydrolysis of phosphodiester bonds in RNA chains. 

Pancreatic ribonucleases are pyrimidine-specific ribonucleases present in high quantity in the pancreas of 
a number of mammalian taxa and of a few reptiles. In addition to their function in the hydrolysis of RNA, 
ribonucleases have evolved to support a variety of other physiological activities. Such activities include 
anti-parasite, anti-bacterial, anti-virus, and antineoplastic activities, as well as, in some cases, promoting 

20 neurotoxicity and angiogenesis. For example, bovine seminal ribonuclease is anti-neoplastic (Laceetti, P. 
et al. (1992) Cancer Res. 52: 4582-4586), and some frog ribonucleases display both anti -viral and anti- 
neoplastic activity (Youle, R. J. et al. (1994) Proc. Natl. Acad. Sci. USA 91: 6012-6016; Mikulski, S. M. 
et al. (1990) J. Natl. Cancer Inst. 82: 151-152; and Wu, Y. -N. et al. (1993) J. Biol. Chem. 268: 10686- 
10693). In addition, angiogenin is a tRNA-specific ribonuclease which binds actin on the surface of 

25 endothelial cells for endocytosis and is then translocated to the nucleus where it promotes endothelial 
invasiveness required for blood vessel formation (Moroianu, J. and Riordan, J. F. (1994) Proc. Natl. 
Acad. Sci. USA 91: 1217-1221). Further, eosinophil-derived neurotoxin (EDN) and eosinophil cationic 
protein (ECP) are related ribonucleases which possess neurotoxicity (Beintema, J. J. et al. (1988) 
Biochemistry 27: 4530-4538; Ackerman, S. J. (1993) In Makino, S. and Fukuda, T., Eosinophils: 

30 Biological and Clinical Aspects. CRC Press, Boca Raton, Fla., pp 33-74). ECP also exhibits cytotoxic, 
anti-parasitic, and anti-bacterial activities. Finally, an EDN-related ribonuclease, RNase k6, is expressed 
in normal human monocytes and neutrophils, suggesting a role for this ribonuclease in host defense 
(Rosenberg, H. F. and Dyer, K. D. (1996) Nuc. Acid. Res. 24: 3507-3513). 
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It is believed that the protein of SEQ ID NO:269 is a ribonuclease, and is thus capable of 
hydrolyzing ribonucleic acids, and is involved in the a number of processes including defense against 
infection and neoplasia, as well as in neurotoxicity and angiogenesis. Preferred polypeptides of the 
invention are any fragments of SEQ ID NO:269 having any of the biological activities described herein. 
5 The ribonuclease activity of the protein of the invention or part thereof may be assayed using any assay 
known to those skilled in the art, including those described in US patent 5,866,1 19. 

In one embodiment, the present polynucleotides and polypeptides are used to specifically detect 
testis tissue and cells derived from the testis, as the present protein is overexpressed in this tissue. For 
example, the protein of the invention or part thereof may be used to synthesize specific antibodies using 
10 any technique known to those skilled in the art. Such tissue-specific antibodies may then be used to 
identify tissues of unknown origin, such as in forensic samples, differentiated tumor tissue that has 
metastasized to foreign bodily sites, etc., or to differentiate different tissue types in a tissue cross-section 
using immunochemistry. 

The present invention relates to methods and compositions using the protein of the invention or 

15 part thereof to hydrolyze one or several substrates, preferably nucleic acids, more preferably RNA, alone 
or in combination with other substances. For example, the protein of the invention or part thereof is 
added to a sample containing a substrate(s) in conditions amenable to enzyme activity, and the protein 
thus catalyzes the hydrolysis of the substrate(s). 

In a preferred embodiment, the protein of the invention or part thereof may be used to remove 

20 contaminating RNA in a biological sample, alone or in combination with other nucleases. In a more 
preferred embodiment, the protein of the invention or part thereof is used to remove contaminating RNA 
from DNA preparations, to remove RNA templates prior to second strand synthesis and prior to analysis 
of in vitro translation products. In one such embodiment, the protein of the invention or part thereof is 
added to a biological sample as a "cocktail" along with other nucleases. The advantage of using a 

25 cocktail of hydrolytic enzymes is that one is able to hydrolyze a wide range of substrates without knowing 
the specificity of any of the enzymes, or even the identity of all of the substrates. Such cocktails of 
nucleases are commonly used in molecular biology assays, for example to remove unbound RNA in 
RNAse protection assays. Using a cocktail of hydrolytic enzymes also protects a sample from a wide 
range of future unknown RNA contaminants from a vast number of sources. For example, the protein of 

30 the invention or part thereof is added to samples where contaminating substrates are undesirable. 
Alternatively, the protein of the invention or part thereof may be bound to a chromatographic support, 
either alone or in combination with other hydrolytic enzymes, using techniques well known in the art, to 
form an affinity chromatography column. A sample containing the undesirable substrate is run through 
the column to remove the substrate. Immobilizing the protein of the invention or part thereof on a support 

304 



G-078US05DIV 



is particularly advantageous for those embodiments in which the method is to be practiced on a 
commercial scale. This immobilization facilitates the removal of the enzyme from the batch of product 
and subsequent reuse of the enzyme. Immobilization of the protein of the invention or part thereof can be 
accomplished, for example, by inserting a cellulose-binding domain in the protein. One of skill in the art 
5 will understand that other methods of immobilization could also be used and are described in the available 
literature. Alternatively, the same methods may be used to identify new substrates. 

In another embodiment, the protein of the invention or part thereof may be used to decontaminate 
or disinfect samples infected by undesirable parasite, bacteria and/or viruses using any of the methods 
known to those skilled in the art including those described in Youle et al, (1994), supra; Mikulski et al 
10 (1990) supra, Wu et al (1993). In a preferred embodiment, the protein is used to eliminate RNA viruses 
from a sample or in a patient. 

In another embodiment, the present invention relates to compositions and methods using the 
protein of the invention or part thereof to selectively kill cells. The protein of the invention or part thereof 
is linked to a recognition moiety capable of binding to a chosen cell, such as lectins, receptors or 
15 antibodies, thereby generating cell-specific cytotoxic reagents as described in US Patent No. 5,955,073, 
the disclosure of which is herein incorporated in its entirety. 

In another embodiment, the protein of the invention or part thereof is used in the diagnosis, 
prevention and/or treatment of neoplastic disorders. In one such embodiment, cancer can be treated or 
prevented in a patient by increasing the activity of the present protein in the patient, particularly within 
20 neoplastic or hyperplastic cells within the patient. For example, a polynucleotide encoding the protein of 
the invention or part thereof, a polynucleotide encoding the protein, or a compound that causes an 
increase in the expression or activity of the protein, can be administered to the patient, or to cells derived 
from the patient, in vivo or ex vivo. Preferably, the protein, polynucleotide, or compound is specifically 
targeted to the neoplastic or hyperplastic cells, for example by intratumoral injection of the molecule or 
25 by linking the molecule to a targeting moiety, such as a tumor cell-specific antibody. 

In another embodiment, cancer can be treated or prevented in a patient by inhibiting the 
expression or activity of the protein of the invention in endothelial cells of the patient, in particular within 
endothelial cells involved in angiogenesis. Such expression or activity can be inhibited in any of a 
number of ways, for example using antibodies, antisense sequences, ribozymes, dominant negative forms 
30 of the protein, as well as small molecule inhibitors of protein activity or expression. 

In another embodiment, the present polynucleotide and polypeptide sequences are used to 
diagnose cancer in a patient. In a typical such embodiment, a biological sample is obtained from a 
patient, and the level of the present polypeptides or polynucleotides is detected and compared with a 
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control level, where a difference between the level observed in the patient and the control level indicates 
the presence of cancer in the patient. 

In another embodiment, the present protein is is inhibited within cells of a mammal in order to 
protect cells of the mammal against RNAse-associated neurotoxicity. In a typical such embodiment, the 
5 level of the protein is detected within the cells of the patient, where an elevated level of the protein, 
particularly within neurons, indicates a risk for neurotoxicity. The level of the expression or activity of 
the protein is subsequently inhibited using any standard method, such as antibodies, antisense molecules, 
ribozymes, dominant negative forms of the protein, or any other compounds that inhibit the expression or 
activity of the protein. Preferably, such inhibitors are specifically directed to the neurons of the mammal. 

10 Protein of SEP ID NO: 390 (internal designation 1 16-1 18-4-0-A8-CS ) 

The present inventors have provided a new gene and protein described in SEQ ID No 149 and 
390 respectively, belonging to the carbonic anhydrase (CA) family, more particularly the alpha-CA 
family. This novel alpha-CA related gene is located on the human chromosome 17q24 region. 

The Carbonic anhydrases (EC 4.2.1.1) (CA) are zinc metalloenzymes which catalyze the 

15 reversible hydration of carbon dioxide. Nine different active Alpha-carbonic anhydrases (alpha-CA) that 
catalyze the hydration reaction have been found, as well as at least two alpha-CA-related enzymes. All 
known carbonic anhydrases from the animal kingdom are alpha-CAs, as opposed to beta- and gamma- 
CAs, which are also zinc containing enzymes but are unrelated by sequence. The protein of SEQ ID No. 
390 displays significant homology to the pfam Carbonic anhydrase domains amino acids between 20- to 

20 59 of the protein, in particular the motif Gly-Ser-Glu-His in position 45 to 48 of the protein which has 
been found to be highly conserved in a multi-alignment published by Lovejoy et al. 1998 (Genomics 54, 
484-493). The chromosomal localization was found by BLAST alignment with a sequence mapping to 
chromosome 17 (genbank genomic fragment with accession number AC002090) and another alignment 
with a genomic fragment (accession number AF064854) which maps the gene in the 17q24 region. The 

25 polypeptide of SEQ ID No. 390 displays particularly high homology to a human protein called carbonic 
anhydrase-related protein 10 (genbank accession number AB036836, published directly in database by 
Adachi,K. and Nishimori,I). 

The human alpha-CAs contain a highly conserved catalytic site which comprises a zinc 
coordination polyhedron defining an active site located in a large cone-shaped cavity that extends almost 

30 to the center of the alpha-CA molecule. One site of the- cavity is formed by hydrophobic residues, which 
the other side contains hydrophillic residues, including Thrl99 and Glul06 (referring to CA II enzyme). 
The zinc ion is located at the bottom of this cleft, and tetrahedrally coordinated to the imidazoles of three 
histidine residues (His94, His96, His 119, referring to CA II enzyme) and to a water molecule called the 
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'zinc water ? that ionizes to a hydroxide ion with a pK of about 7. (Sly et al., Ann. Rev. Biochem. 64:375- 

401 (1995). Studies have shown that the Zn-OH-/Thel99/Glul06 network is important in binding 

bicarbonate, sulfonamide inhibitors and many anionic inhibitors (Liljas et al., Eur. J. Biochem. 219:1-10). 

Improved alpha-CA inhibitors 
5 Of the human alpha-CAs, it has been found that the various isozymes have differing tissue 

distributions and intracellular localizations. Alpha-CA II for example, is expressed in the cytosol of some 

cell types in virtually every tissue or organ, while alpha-CA I is expressed in colon and erythrocytes, and 

alpha-CA IV is expressed on the apical surfaces of epithelial cells of some segments of the nephron, the 

apical plasma membrane in the lower gastrointestinal tract, and the plasma face of endothelial cells of 

10 certain capillary beds. The protein of SEQ ID NO: 390 encoded by the cDNA of SEQ ID NO: 149 has 

been found by the present inventors to be expressed in testes. 

The human alpha-CAs have been found to be involved in a range of important biological 

functions involving pH regulation, C02 and HCO3- transport, ion transport and water and electrolyte 

balance. Functions in which alpha-CAs are involved include H+ secretion, HCO3- reabsorption, HCO3- 

15 secretion, bone resorption, and production of aqueous humor, cerebrospinal fluid, gastric acid and 
pancreatic juice. Of particular medical interest, CA II has been found to be implicated in osteoporosis, as 
CA II defects have been found to be a cause of inherited osteoporosis, found along with renal tubular 
acidosis and brain calcification. 

CA activity can be determined by well known means. Assays used to characterize CA isozyme 

20 activity are provided for example in Khalifah, J. Biol. Chem. 246:2561-73 (1971); Chen et al, Biochem 
32: 7861-65 (1993); Tu et al., J. Biol. Chem. 258:8867-8871 (1986); and Jewel et al., Biochem. 30:1484- 
1490(1991). 

Many different inhibitors of CA have been identified, and certain CA inhibitors have been 
developed as medicaments. CA inhibitors are currently a primary treatment for glaucoma, where 
25 inhibition of CA activity reduces intra-ocular pressure by inhibiting formation of aqueous humor. 
Approved CA inhibitors for glaucoma include Acetazolamide (Diamox®), Methazolamide (Neptazane®), 
Dorzolamide (Trusopt®) and Brinzolamide (Azopt®). 

Improved broad-acting CA inhibitors 

In certain treatment settings, there is a need for CA inhibitors capable of inhibiting the broad class 
30 of CA isozymes so as to inhibit CO2 hydration activity. Local (topical) use of CA inhibitors has been 

found advantageous over systemic application for glaucoma, allowing systemic side effects of CA 
inhibitors to be avoided. However, current CA inhibitors may have limited efficacy in terms of ability to 
completely inhibit total CA activity. A topical treatment, Dorzolamide (Trosopt®), for example, is an 
inhibitor of CA II, but only weakly inhibits CA VI. (Hoyng et al,. Drugs, 50(3): 411-434 (2000). 
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Inhibitors of CA II may be unable to inhibit CA I or other CAs, which is thought to result in decreased 
drug efficacy because other CAs can compensate for loss of CA II activity (Sly and Hu, Ann. Rev. 
Biochem. 64:375-401 (1995)). In one example, CA I is five to six times as abundant as CA II in human 
erythrocytes, but has only about 15% of the activity. Thus, CA I contributes about 50% of the total CA 
5 activity (Dodgon et al., J. Appl. Physiol. 64:1492-80 (1988). Moreover, CA I may have different 
inhibitor sensitivity profile from CA II, as CA I is less sensitive to sulfonamide inhibitors, for example. 
CA II and CA IV on the other hand, show significant resistance to inhibition with halide ions in 
comparison to CA I. (Sly et al, (1995), supra) Thus, a significant amount of residual CA activity in a cell 
or tissue of interest may be due to other CAs, including the polypeptide of SEQ ID No. 390. 

10 Thus, in one aspect, alpha-CA related nucleic acid and polypeptide may be useful for the 

identification of compounds capable of inhibiting the alpha-CA-catalyzed reversible hydration reaction. 
In one aspect, the method is carried out to identify or select CA inhibitors capable of inhibiting the 
activity of the polypeptide of SEQ ID No. 390. In other aspects, the method is carried out to identify or 
select CA inhibitors capable of broadly inhibiting the activity of a large number of CA enzymes. The 

1 5 nucleic acid and polypeptide sequences of the invention can be used in computer based drug design or for 
carrying out binding predictions with candidate CA inhibitors in view of the extensive structural 
information publicly available for CA enzymes. In preferred embodiments, the nucleic acid and 
polypeptide of the invention is used in drug screening assays. Assays may be cell based or non-cell based 
assays. In one embodiment, a nucleotide or polypeptide sequence of the invention is brought into contact 

20 with a candidate CA inhibitor (such as a CA II inhibitor), and binding of the candidate inhibitor to the 
polypeptide of the invention, or the activity of the polypeptide of the invention is detected. Activity of the 
polypeptide of the invention may be CA activity, or any other suitable activity possessed by the 
polypeptide of the invention which may be inhibited by binding of the candidate substance. Assays for 
detecting hydration of carbon dioxide are well known, and referenced above. In preferred embodiments, a 

25 panel of CA isozymes including the polypeptide of the invention are screened against the candidate 
substance, including one or more enzymes selected from the group consisting of CA I, CA III, CA IV, 
CA VI, and a CA-RP including but not limited to CA-RP VII, CA-RP X and CA-RP XI. In preferred 
embodiments, a candidate CA inhibitor is selected according to its ability to broadly inhibit CA isozymes 
capable of catalyzing the hydration of carbon dioxide. Means to conduct such drug screening assays are 

30 well known in the art. In one embodiment drug binding is tested, using means described further herein as 
well as for example in International Patent Publication No. WO 00/58510, the disclosure of which is 
incorporated herein by reference in its entirety, particularly the section titled "Methods for screening 
substances interacting with... polypeptides". Drug binding assays on a large panel of isozymes may also 
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be carried out in high throughput format using commercially available binding assay systems (Graffinity 
Pharmaceutical Design GmbH, Heidelberg, Germany). 

The method according to the invention may generally be used to identify or select candidate 
compounds for the treatment of a disorder characterized by a disorder in pH regulation, C02 and HCO3- 

5 transport, ion transport or water and electrolyte balance. 
Improved selective CA inhibitors 

In other therapeutic strategies, CA inhibitors are delivered orally. However, systemic delivery 
may affect CA enzymes present in other tissues or organs leading to harmful side effects. It can be 
expected that CA II inhibitors may also partially or fully inhibit other CA isozymes, such as CA I, or a 

10 CA-related protein (CA-RP) such as CA-RP VIII, X, XI or CA RPTP-beta (Tashian et ah, In "Carbonic 
Anhydrase: New Horizons", W.R. Chegwidden, N.D. Carter and Y.H. Edwards, Eds., Birkhauser, Basel); 
Adachi,K et al., Genbank accession number AB036836; Lovejoy et al. (1998); Peles et al. (1995)) or the 
CA polypeptide of SEQ ID No 390. CA-RPTP-beta, for example, has a CA domain having no or reduced 
CA activity but is thought to be involved in the ligand binding or protein complex participation in view of 

15 its binding of contactin by an extracellular region. (Peles et al., Cell 82: 251-260 (1995); Tashian et al., 
(1998), supra). The inhibition of an isozyme such as CA-RPTP-beta or the isozyme of the present 
invention by systemic treatment using a non-selective drug may result in harmful side effects. 

In one example, while oral CA inhibitors for the treatment of glaucoma (eg. acetazolamide) have 
been effective and without ocular adverse effects, they have shown important systemic effects, including 

20 parasthesia of the acra, fatigue, depression, renal stones and gastrointestinal complaints such as nausea 
and diarrhoea. (Hoyng et al., 2000, supra) Because CA inhibitors are typically used permanently (eg for 
glaucoma), or over long periods of time, avoiding side effects is particularly important. Selective CA 
inhibitors capable of inhibiting a CA isozyme of interest to a greater extent than another CA isozyme may 
thus offer improved means for the treatment of disease. 

25 In one embodiment, the nucleotide and polypeptide sequences of the present invention may be 

used to design selective CA inhibitors. Studies have also shown that the different alpha-CA have 
different inhibitor binding properties (Sly et al., (1995), supra), suggesting that it may be possible to 
provide compounds that inhibit a CA isozyme of interest, such as CA II, while not binding to or inhibiting 
related enzymes such as the polypeptide of SEQ ID No. 390. The nucleic acid and polypeptide 

30 sequences of the invention can be used in computer based drug design or for carrying out binding 
predictions with candidate CA inhibitors in view of the extensive structural information publicly available 
for CA enzymes. In preferred embodiments, the nucleic acid and polypeptide of the invention is used in 
drug screening assays, including both cell based and non cell based assays. In one embodiment, a 
nucleotide or polypeptide sequence of the invention is brought into contact with a candidate CA inhibitor 
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(such as a CA II inhibitor), and binding of the candidate inhibitor to the polypeptide of the invention, or 
the activity of the polypeptide of the invention is detected. Activity of the polypeptide of the invention 
may be CA activity, or any other suitable activity possessed by the polypeptide of the invention which 
may be inhibited by binding of the candidate substance. In preferred embodiments, a panel of CA 
5 isozymes including the polypeptide of the invention are screened against the candidate substance, 
including the polypeptide of SEQ ID No 39 and one or more enzymes selected from the group consisting 
of CA I, CA III, CA IV, CA VI, a CARP including but not limited to CARP VII, CARP X, CARP XL In 
preferred embodiments, a candidate CA inhibitor is screened against one or more non-catalytic CA 
related proteins to eliminate undesired inhibition of these enzymes which may be involved in other 
10 important physiological functions. Means to conduct such drug screening assays are well known in the 
art. 

Increasing alpha-CA activity for the treatment of alpha-CA deficiency disease 

The polypeptide of the invention may also be used as a source of CA activity, such as for the 

treatment of disease. The defects in carbonic anhydrases are the cause of several diseases, including 

15 osteopetrosis (abnormally dense bone) renal tubular acidosis, cerebral calcification and mental 

retardation. Also, a carbonic anhydrase-related protein is described as being linked to cone-rod retinal 

distrophy (Bellinghan et al., 1998, Biochem. Biophys. Res. Comm.: 253, 364-367). 

In one aspect, the invention thus involves increasing CA activity by providing increased activity 

of the polypeptide of SEQ ID No. 390. Increased activity of the polypeptide of SEQ ID No 390 can be 

20 provided by any suitable means, as further describer herein. Activity may be provided for example by 

introducing to a host cell or patient a vector containing a nucleotide sequence of SEQ ID No 149, treating 

said cell with a compound capable of increasing the expression of the polypeptide of the invention and/or 

treating a cell or patient directly with a polypeptide of SEQ ID No 390. In preferred embodiments, the 

polypeptide of the invention comprises at least one amino acid substitution, deletion or insertion. In one 

25 aspect, such amino acid changes are preferably in the catalytic site; preferably said amino acid changes 

involve the substitution, deletion or insertion of a His residue and preferably said amino acid changes 

increase the CO2 hydration activity of the polypeptide of the invention. 

Metal ion biosensors 

In further aspects, metal ion biosensors can be designed based on the polypeptide of SEQ ID No 
30 390. Determination of metal ion concentrations in complex media such as serum, cell cytoplasm as well 
as for example seawater are important analytical functions that require high degrees of sensitivity and 
selectivity. 

Biosensors may be particularly useful in detecting metal ion fluxes in and between cells. Such 
biosensors may exploit metal-binding ability of the polypeptide of the invention, as described by 
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Thompson et al., who have developed such biosensors based on the CA enzyme (CA II). Such biosensors 
are useful in the detection of metal ion flux for example in the central nervous system. Zinc-containing 
neurons found throughout the mammalian cerebral cortex, striatum and amygdalar nuclei have been 
shown to release their zinc in a depolarization- and calcium-dependent fashion in vitro and in vivo. This 
5 zinc release has been suggested to act as a trans-synaptic neuromodulator which has in turn been linked to 
excitotoxic neuronal cell death. CA based biosensors developed by Thomspon et al. showed that zinc is 
present and can be detected in extracellular medium from neurons. (Thompson et al, J. Neurosci Methods 
96:35-45 (2000)). 

Biosensors based on CA have been shown to be extremely selective, detecting Cu at 

10 subpicomolar levels, which is of sensitivity that might be achieved with mass spectometric techniques. 

Sensors based on the CA II isozyme have been shown to detect Zn and Cu at picomolar levels, and Cd, 

Co and Ni at nanomolar levels. (Thompson et al, Anal. Biochem. 267:185-195 (1999)). CA based 

biosensors have also demonstrated selectivity over potential interferents in biological systems at mM 

levels in extracellular fluids, such as Mg and Ca. (Thompson et al. (2000), supra). 

15 Biosensors based on the polypeptide of the invention are based on the high selectivity and 

sensitivity of CA isozymes for zinc. Because the binding of Zn in the active site of the enzyme affects the 

enzyme's ability to bind a CA inhibitor, it is possible to use a CA inhibitor that exhibits a detectable 

change upon binding to the polypeptide of the invention to detect the fraction of polypeptide bound to the 

inhibitor, and therefore bound to Zn. The fraction of polypeptide with bound Zn in turn is determined by 

20 the concentrations of free Zn and the polypeptide of the invention, and the dissociation constant for zinc. 

In one example, binding of the CA inhibitor to the polypeptide of the invention is detected by 

using a fluorescent inhibitor, whereby the inhibitor shows a detectable change in fluorescence emission 

wavelength of polarization upon binding to the polypeptide of the invention. In one example, a 

fluorescent sulfonamide is used, such as the fluorophore ABD-N (Thompson et al. (2000), supra). 

25 Engineered CA enzymes 

CA isozymes have been shown to have differing levels of catalytic activity and efficiency. In 

preferred embodiments, particularly for treatments which involve providing the increased activity of the 

polypeptide of SEQ ID No 390 or for use in metal ion biosensors, the polypeptide of the invention may be 

modified for increased CO2 hydration and/or zinc binding. 

30 In particular, studies have been carried out characterizing residues important for maximal CA 

activity, allowing CA isozymes to be designed having desired levels of activity. Important structural 
elements in CA isozymes for zinc binding, CO2 hydration activity and stability are reviewed in Lindskog, 

Pharmacol. Ther. 74(l):l-20 (1997) and Sly (1995), supra. In one example, studies of CA III showed 
that changing the Phel98 residue to a Leul98 residue (as in CAII) resulted in a 25 fold increase in 
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activity. (Chen et al., (1993), supra). Catalysis has also been greatly increased in CA II by replacing the 
Thr200 residue with His, as is normally found in CA I enzymes. Most dramatically, a CA-related protein 
(CA-RP) which in its native form was missing important residues at the catalytic site and had no 
detectable C02 hydration activity at all was rendered an active CA by only two point mutations. (Sjoblom 
5 et al, FEBS Lett. 398: 322-325(1996)). 

Thus, in embodiments where the polypeptide of the invention is used to provide a source of CO2 

hydration or for its zinc binding properties, it is advantageous to modify the polypeptide of the invention 
by introducing at least one amino acid substitution, deletion or insertion. In one aspect, such amino acid 
changes are preferably in the catalytic site; preferably said amino acid changes involve the substitution, 
10 deletion or insertion of a His residue and preferably said amino acid changes increase the C02 hydration 
activity of the polypeptide of the invention. Optimal amino acid changes can be determined by the skilled 
artisan, particularly in view of sequence comparisons which can be carried out with the many well- 
characterized CA isozymes. 

Protein of SEP ID NO:252 (internal designation 105-089-3-0-G10-CS) 

15 The protein of SEQ ID NO:252 is encoded by the cDNA of SEQ ID NO: 1 1 . Accordingly, it will 

be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO: 252 described 
throughout the present application also pertain to the polypeptide encoded by the human cDNA of clone 
105-089-3-0-G10-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic 
acid of SEQ ID NO:l 1 described throughout the present application also pertain to the human cDNA of 

20 clone 105-089-3-0-G10-CS. It is over represented in fetal brain. 

The protein of SEQ ID NO:252 encoded by the cDNA of SEQ ID NO:l 1 is distributed primarily 
in the prostate and salivary gland. The protein of SEQ ID NO:252 is homologous to sequences described 
in PCT publication WO9827205-A2 (which describes a protein that was isolated from a human adult 
salivary gland cDNA library), PCT publication W09839446-A2, PCT publication W09839446-A2. The 

25 disclosures of each of the preceding PCT publications is incorporated herein by reference in their 
entireties. 

The protein of SEQ ID NO:252 is also homologous to a polypeptide described in PCT publication 
W09835229-A1, the disclosure of which is incorporated herein by reference in its entirety. Wo9835229- 
Al describes a peptide of 27 amino acid residues that corresponds to 23/27 of a portion of the protein of 
30 SEQ ID NO:252 (amino acid 20-46). This corresponds to 85% identity with conserved changes (3 out of 
4) yielding a 96% homology. 

The protein described in WO 9835229 was identified in reflex tears that were collected from 12 
non-contact lens wearing male and female humans. Reflex tears were stimulated by gently rubbing the 
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nasal mucosa with a cotton wool tipped bud. Two different batches were collected from two different 
groups and examined by analytical and preparative 2-dimensional electrophoresis. After separation in the 
second dimension and transfer to PVDF membranes, identified protein spots (by 0.1% (w/v) Coomassie 
Blue) were loaded into a membrane-compatible Hewlett-Packard cartridge. Sequencing was conducted 
5 with a Model G1005A (Hewlett-Packard, CA) sequenator. One of the proteins identified migrated at 25 
kDa and was revealed to have 5 isoforms of different pi. Two of these were N-terminally sequenced and 
gave the sequence of the above peptide with a pi of 5.0 and 4.4. The different isoforms indicate that this 
protein undergoes post-translational modifications, including sialylation or acylation. The presence of 
these isoforms in different degrees could reflect the disease status of the individual. Accordingly, one 

10 embodiment of the present invention relates to the detection or diagnosis of disease by determining the 
activity or level of the protein of SEQ ID NO:252 or a polynucleotide encoding the protein of SEQ ID 
NO:252 in an individual. For example, detection of the secreted protein of SEQ ID NO:252 in an 
individual may be accomplished non-invasively by measuring protein levels in bodily fluids into which 
the protein is secreted, such as tears and saliva. Such methods may be empolyed both in humans and in 

15 animals. It is probable that after the signal peptide is cleaved, the protein of SEQ ID NO:252 is secreted 
into bodily fluids including tears and probably saliva. 

The protein of SEQ ID NO:252 can also be used for the screening of non-ocular diseases, by 
analyzing tears for marker proteins, particularly indicative of cancer and genetic disease. In addition, an 
altered chromatographic profile (e.g. 2D gel) of the isoforms of the protein of SEQ ID NO:252 may also 

20 indicate the disease state of an individual. For example, the levels of marker proteins in relation to the 
protein of SEQ ID NO:252 may be determined to evaluate whether the individual' is suffering from a 
disease. Alternatively, tears may be analyzed for the levels of different isoforms of the protein of SEQ ID 
NO:252 to determine whether the pattern of such isoforms is indicative of disease. 

The protein of SEQ ED NO:252 or fragments thereof may also be used as a lubricant or cleansing 

25 agent for the eyes. This protein can be included in contact lenses washing and storage solutions. This 
protein can also be useful as an ingredient in eye washing solutions (e.g. eye drops) used for everyday 
redness or healing after surgical/laser intervention. For example, the protein may be used to reduce eye 
inflammation. Alternatively, anti-bacterial properties may be exploited by including the protein of SEQ 
ID NO:252 or fragments thereof in solutions, creams or ointments for the eyes, as well as creams or 

30 ointments in general for external applications. 

Accordingly, the present invention includes the use of the protein of SEQ ID NO:252, fragments 
comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 
acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition in an 
individual. In such embodiments, the protein of SEQ ID NO:252, or a fragment thereof, is administered 
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to an individual in whom it is desired to increase or decrease any of the activities of the protein of SEQ ED 
NO:252. The protein of SEQ ID NO:252 or fragment thereof may be administered directly to the 
individual or, alternatively, a nucleic acid encoding the protein of SEQ ID NO:252 or a fragment thereof 
may be administered to the individual. Alternatively, an agent which increases the activity of the protein 
5 of SEQ ID NO:252 may be administered to the individual. Such agents may be identified by contacting 
the protein of SEQ ID NO:252 or a cell or preparation containing the protein of SEQ ID NO:252 with a 
test agent and assaying whether the test agent increases the activity of the protein. For example, the test 
agent may be a chemical compound or a polypeptide or peptide. 

Alternatively, the activity of the protein of SEQ ID NO:252 may be decreased by administering 

10 an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:252 may be identified by contacting the protein of SEQ ID NO:252 or a cell or 
preparation containing the protein of SEQ ID NO:252 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 

1 5 forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify the source of a sample as, for example, 
saliva or tears, or to distinguish between two or more possible sources of a sample on the basis of the 
level of the protein of SEQ ID NO:252 in the sample. For example, the protein of SEQ ID NO:252 or 

20 fragments thereof may be used to generate antibodies using any techniques known to those skilled in the 
art, including those described therein. Such antibodies may then be used to identify tissues of unknown 
origin, for example, forensic samples,differentiated tumor tissue that has metastasized to foreign bodily 
sites, or to differentiate different tissue types in a tissue cross-section using immunochemistry. In such 
methods a sample is contacted with the antibody, which may be detectably labeled, under conditions 

25 which facilitate antibody binding. The level of antibody binding to the test sample is measured and 
compared to the level of binding to control cells from saliva or tears or tissues other than saliva or tears to 
determine whether the test sample is from saliva or tears. Alternatively, the level of the protein of SEQ 
ID NO:252 in a test sample may be measured by determining the level of RNA encoding the protein of 
SEQ ID NO:252 in the test sample. RNA levels may be measured using nucleic acid arrays or using 

30 techniques such as in situ hybridization, Northern blots, dot blots or other technques familiar to those 
skilled in the art. If desired, an amplification reaction, such as a PCR reaction, may be performed on the 
nucleic acid sample prior to analysis. The level of RNA in the test sample is compared to RNA levels in 
control cells from saliva or tears or tissues other than saliva or tears to determine whether the test sample 
is from saliva or tears. 
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In another embodiment, antibodies to the protein of the invention or part thereof may be used for 
detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:252, including using 
methods known to those skilled in the art. For example, an antibody against the protein of SEQ ID 
NO:252 or a fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A 
5 preparation containing cells expressing the protein of SEQ ID NO:252 is placed in contact with the 
antibody under conditions which facilitate binding to the antibody. The support is washed and then the 
cells are released from the support by contacting the support with agents which cause the cells to 
dissociate from the antibody. 

In another embodiment of the present invention, the protein of SEQ ID NO:252 or a fragment 
10 thereof thereof may be used to diagnose disorders associated with altered expression of the protein of 
SEQ ID NO:252. In such techniques, the level of the protein of SEQ ID NO:252 in an ill individual is 
measured using techniques such as those described herein. The level of the protein of 252 in the ill 
individual is compared to the level in normal individuals to determine whether the individual has a level 
of the protein of SEQ ID NO:252 which is indicative of disease. 

15 Protein of SEQ ID NO:308 (internal designation 1 87-41 -0-0-i21-CS) 

The protein of SEQ ID NO:308 is encoded by the cDNA of SEQ ID NO:67. Accordingly, it will 
be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:308 described 
throughout the present application also pertain to the polypeptide encoded by the human cDNA of clone 
187-41-0-0-i21-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic acid 
20 of SEQ ID NO:67 described throughout the present application also pertain to the human cDNA of clone 
187-41-0-0-i21-CS. 

The protein of SEQ ID NO:308 is highly homologous to human secreted protein nf87_l from 
PCT publication WO 9935252-A2 (the disclsoure of which is incorporated herein by reference in its 
entirety), to amino acids 26-129 of the human secreted protein SEQ ID NO:441 from PCT publication 

25 WO 9906548-A2 (the disclosure of which is incorporated herein by reference in its entirety), and to 
amino acids 26-1 14 of human secreted protein SEQ ID NO:439 from PCT publication WO 9906548-A2, 
the disclosure of which is incorporated herein by reference in its entirety. Thus, the protein of the 
invention appears to be a polymorphic variant of nf87_l . Since most of the proteins with high homology 
to the sequence of the invention have longer 5 'termini, it is conceivable that the protein of the invention is 

30 a truncated/spliced variant of these proteins. 

The protein of SEQ ID NO:308 was identified among the cDNAs from a library constructed from 
brain. Tissue distribution analysis through a BLAST analysis of databases shows that mRNA encoding 
this protein was found primarily in kidney, liver, and cancerous prostate. 



315 



G-078US05DIV 



The protein of SEQ ID NO:308 has chemical and structural homology to human interferon- 
inducible (IFI) protein isoforms p27 (63%), HEFI (50% identity), and to interferon-induced protein 6-16 
precursor (IFI-6-16, 36%). Furthermore, the protein of the invention has structural homology (40% 
identity) to the human erythropoietin (EPO) primary response gene, EPRG3pt from PCT publication WO 
5 9906063-A2, the disclosure of which is incorporated herein by reference in its entirety. Thus, the present 
invention relates to nucleic acid and amino acid sequences of a novel IFI protein and to the use of these 
sequences in the diagnosis, study, prevention and treatment of disease. 

The protein of SEQ ID NO:308 comprises 105 amino acids. From the amino acid alignments and 
the hydrophobicity plots, it has a predicted signal peptide sequence spanning residues 31-43 and two 
10 predicted transmembrane domains spanning residues 17-37, and 48-68. Accordingly, one embodiment of 
the present invention is a polypeptide comprising the signal peptide and/or one or more of the 
transmembrane doamins. 

Interferons (IFNs) are a part of the group of intercellular messenger proteins known as cytokines. 
a-IFN is the product of a multigene family of at least 16 members, whereas b-IFN is the product of a 
15 single gene, a- and (3-IFNs are also known as type I IFNs. Type I IFNs are produced in a variety of cells 
types. Biosynthesis of type I IFNs is stimulated by viruses and other pathogens, and by various cytokines 
and growth factors. y-IFN, also known as type II IFN, is produced in T-cells and natural killer cells. 
Antigens to which the organism has been sensitized stimulate biosynthesis of type II IFN. Both a- and y- 
IFNs are immunomodulators and anti-inflammatory agents, activating macrophages, T-cells and natural 
20 killer cells. 

IFNs are part of the body's natural defense to viruses and tumors. They exert these defenses by 
affecting the function of the immune system and by direct action on pathogens and tumor cells. IFNs 
mediate these multiple effects in part by inducing the synthesis of many cellular proteins. Some 
interferon-inducible (IFI) genes are induced equally well by a-, p- and y-IFNs. Other IFI genes are 

25 preferentially induced by the type I or by the type II IFNs. The various proteins produced by IFI genes 
possess antitumor, antiviral and immunomodulatory functions. The expression of tumor antigens in 
cancer cells is increased by a-IFN, and renders the cancer cells more susceptible to immune rejection. 
The IFI proteins synthesized in response to viral infections are known to inhibit viral functions such as 
cell penetration, uncoating, RNA and protein synthesis, assembly and release (Hardman JG et al 25 

30 (1996) The Pharmacological Basis of Therapeutics, McGraw-Hill, New York NY pp 1211-1214, the 
disclosure of which is incorporated herein by reference in its entirety). Type II IFN stimulates expression 
of major histocompatibility complex (MHC) proteins and is thus used in immune response enhancement. 

The IFI gene known as 6-16 encodes an mRNA, which is highly induced by type I IFNs in a 
variety of human cells (Kelly JM et al (1986) EMBO J 5:1601-1606, the disclosure of which is 
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incorporated herein by reference in its entirety). After induction, 6-16 mRNA constitutes as much as 
0.1% of the total cellular mRNA. The 6-16 mRNA is present at only very low levels in the absence of 
type I IFN, and is only weakly induced by type II IFN. The 6-16 mRNA encodes a hydrophobic protein 
of 130 amino acids. The first 20 to 23 amino acids comprise a putative signal peptide. Protein 6-16 has at 
5 least two predicted transmembrane regions culminating in a negatively charged C-terminus. 

The p27 gene encodes a protein with 41% amino acid sequence identity to the 6-16 protein. The 
p27 gene is expressed in some breast tumor cell lines and in a gastric cancer cell line. In other breast 
tumor cell lines, in the HeLa cervical cancer cell line, and in fetal lung fibroblasts, p27 expression occurs 
only upon a -IFN induction. In one breast tumor cell line, p27 is independently induced by estradiol and 

10 by IFN (Rasmussen UB et al (1993) Cancer Res 53:4096-4101, the disclosure of which is incorporated 
herein by reference in its entirety). Expression of p27 was analyzed in 21 primary invasive breast 
carcinomas, 1 breast cancer bone metastasis, and 3 breast fibroadenomas. High levels of p27 were found 
in about one-half of the primary carcinomas and in the bone metastasis, but not in the fibroadenomas. 
These observations suggest that certain breast tumors may produce high levels of, or have increased 

15 sensitivity to, type I IFN as compared to other breast tumors (Rasmussen UB et al, supra). In addition, 
the p27 gene expressed at significant levels in normal tissues including colon, stomach and lung, but not 
expressed in placenta, kidney, liver or skin. (Rasmussen UB et al, supra). 

The small hydrophobic EFI gene products may contribute to viral resistance. A hepatitis-C virus 
(HCV)-induced gene, 130-51, was isolated from a cDNA library prepared from chimpanzee liver during 

20 the acute phase of the infection. The protein product of this gene has 97% identity to the human 6-16 
protein (Kato T et al (1992) Virology 190:856-860, the disclosure of which is incorporated herein by 
reference in its entirety). The authors of the preceding paper suggest that HCV infection actively induces 
IFN expression, which in turn induces expression of IFI genes including 130-51. The EFI proteins 
synthesized in response to viral infections are known to inhibit viral functions such as penetration, 

25 uncoating, RNA or protein synthesis, assembly or release. The 130-51 protein may inhibit one or more of 
these functions in HCV. A particular virus may be inhibited in multiple functions by IFI proteins. In 
addition, the principle inhibitory effect exerted by IFI proteins differs among the virus families (Hardman 
JG, supra, p 121 1, the disclosure of which is incorporated herein by reference). 

The HIFI protein (PCT publication WO 9812223-A2, the disclosure of which is incorporated 

30 herein by reference in its entirety) is a human sequence identified among cDNAs from a library 
constructed from human neonatal kidney. Northern blot analysis using LIFESEQ™ database (Incyte 
Pharmaceuticas, Palo Alto, CA) shows that HIFI mRNA was found only in neonatal kidney. The HIFI 
protein consists of 104 amino acids and has 55%, 45%, and 46% amino acid sequence identity to p27, 6- 
16 and 130-51, respectively. 
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Based on the chemical and structural homology between the protein of SEQ ID NO:308 and the 
small hydrophobic IFI proteins from human and chimpanzee, it is believed that the protein of SEQ ID 
NO:308 is synthesized when interferons are produced in infections, inflammation, autoimmune diseases 
etc. Interferons are produced in response to various cytokines and growth factors, in viral infections, 
5 inflammation, autoimmune diseases, and cancers. Accordingly, the protein of SEQ ID NO:308 or 
fragments thereof may be used in diagnosis and treatment of diseases such as, but not limited to, 
autoimmune disorders such as rheumatoid arthritis, Graves disease, systemic lupus erythematosus, 
autoimmune hepatitis, Wegener's granulomatosis, sarcoidosis, polyarthritis, pemphigus, pemphigoid, 
erythema multiform, Sjogren's syndrome, inflammatory bowel disease, multiple sclerosis, myasthenia 

10 gravis keratitis, scleritis, Type I diabetes, insulin-dependent diabetes mellitus, Lupus Nephritis, and 
allergic encephalomyelitis; proliferative disorders including various forms of cancer such as leukemias, 
lymphomas (Hodgkins and non-Hodgkins), sarcomas, melanomas, adenomas, carcinomas of solid tissue, 
hypoxic tumors, squamous cell carcinomas of the mouth, throat, larynx, and lung, genitourinary cancers 
such as cervical and bladder cancer, hematopoietic cancers, head and neck cancers, and nervous system 

15 cancers, benign lesions such as papillomas, atherosclerosis, angiogenesis; viral infections, in particular 
HCV and HIV infections, as well as other pathogen-induced infections (e.g. leishmania). 

The protein of SEQ ID NO:308 or fragments thereof may also be used to treat conditions 
associated with inflammation or immune impairment (e. g. reumathoid and osteo arthritis and AIDS). 

Another embodiment of the present invention relates to the use of the protein of SEQ ID NO:308 

20 or fragments thereof to treat and/or prevent the ill-effect of bacterial infection during pregnancy in 
mammals, such as spontaneous abortion and maternal death. In a preferred embodiment, the protein of 
the invention may be used to counteract the effects of the bacterial endotoxin lipopolysaccharide (LPS). 
The methods for using such compositions is described in Dziegielewska and Andersen, Biol. Neonate, 
74:372-5 (1998), the disclosure of which is incorporated herein by reference in its entirety. 

25 Furthermore, the protein of SEQ ID NO:308 or fragments thereof are useful as a reagent for 

analyzing the control of gene expression by interferons and other cytokines in both normal and diseased 
cells. The protein of the SEQ ID NO: 308 or fragments thereof may be used to identify specific molecules 
with which it binds such as agonists, antagonists or inhibitors. 

Another embodiment of the present invention relates to methods of using the protein of SEQ ID 

30 NO:308 or fragments thereof to identify and/or quantify cytokines of the interferon family as well as other 
cytokines such as IL10 and tumor antigens, which may interact with the protein of the invention. 

The protein of SEQ ID NO:308 or fragments thereof may also be included in pharmaceutical 
preparations for treating cancer or prevention/treatment of other diseases associated with changes in 
expression of the protein of the invention. In another embodiment of the present invention, the protein of 
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SEQ ID NO 308 or fragments thereof is used to inhibit and/or modulate the effect of cytokines and related 
molecule such as 11-2, TNF alpha, CTLA4, CD28, and others, by preventing the binding of the 
endogenous cytokine to their natural receptors, thereby blocking cell proliferation or inhibitory signals 
generated by the ligand-receptor binding event. 
5 The protein of SEQ ID NO: 308 or fragments thereof is useful to correct defects in in vivo models 

of disease such as autoimmune, inflammation and tumor models, by injecting the protein either intra 
peritoneally intravenously, subcutaneously or directly in the diseased tissue. 

The DNA encoding the protein of SEQ ID NO:308 or fragments thereof is useful in diagnostic 
assays for conditions/diseases associated with expression of the protein of the invention. The diagnostic 

10 assay is useful to distinguish between absence, presence, and excess expression of the protein of the 
invention and to monitor regulation of levels of the protein of the invention during therapeutic 
intervention. The DNA may also be incorporated into effective eukaryotic expression vectors and 
directly targeted to a specific tissue, organ, or cell population for use in gene therapy to treat the above 
mentioned conditions, including tumors and/or to correct disease- or genetic-induced defects in any of the 

15 above mentioned proteins including the protein of the invention. The DNA may also be used to design 
antisense sequences and ribozymes, which can be administered to modify gene expression in tumor and 
pathogen-infected cells and to influence expression of cytokines and growth factors. In vivo delivery of 
genetic constructs into subjects can be developed to the point of targeting specific cell types, such as 
tumor where expression of the protein of the invention may be affected or is modulating the expression 

20 and/or activity of other proteins such as cytokines, growth factors, their receptors and/or tumor antigens. 
It is also useful to detect unknown upstream sequences (e. g. promoters and regulatory elements) by 
standard techniques and for research into the control of gene expression by interferons and other 
cytokines, as well as growth and transcription factors in normal and diseased cells. Hybridization probes 
are useful to detect DNA encoding the protein of the invention (or closely related molecules) in biological 

25 samples, and for mapping the naturally occurring genomic sequence to a particular 
chromosome/chromosome region. The DNA may be used to generate and/or treat in vivo animal models 
of disease, including susceptibility or resistance to infection, inflammation, tumors and autoimmune 
conditions, as well as tumor therapy, based on vaccine, knock-out and transgene technologies. 

Antibodies against the protein of SEQ ID NO:308 or fragments thereof are useful for the 

30 diagnosis of conditions and diseases associated with its expression and to quantify the protein of the 
invention (e. g. in assays to monitor patients during therapeutic intervention). Antibodies specific for the 
protein may include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, Fab fragments 
produced by a Fab expression library. Neutralizing antibodies are especially preferred for diagnostics and 
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therapeutics. Diagnostic assays for the protein of the invention include methods utilizing the antibody and 
a label to detect the protein of the invention in human body fluids or extracts of cells or tissues. 

The protein of the invention and its catalytic or immunogenic fragments or oligopeptides thereof, 
can be used for screening therapeutic compounds in any variety of drug screening techniques including 
5 high throughput. Methods which may be used to quantitate the expression of the nucleotide or protein of 
the invention include, but are not limited to, polymerase chain reaction (PCR), RT-PCR, RNAse 
protection, Northern and western blotting, enzyme-linked immunosorbent asay (ELISA), 
radioimmunoassay (RIA), fluorescent activated cell sorting (FACS), immunoprecipitation, and 
chromatography. 

10 Under conditions of significant blood loss, EPO therapy, or both, iron-restricted erythropoiesis is 

evident. However, intravenous or oral iron therapy has substantial drawbacks. Moreover, traditional 
biochemical markers of storage iron in patients with anemia of chronic disease are unhelpful in the 
assessment of iron status (Lawrence T et al (2000) Blood 96:823-833, the disclosure of which is 
incorporated herein by reference in its entirety). As the protein of SEQ ID NO:308 bears homology to the 

15 human erythropoietin (EPO) primary response gene, EPRG3pt, it may be used to promote red blood cell 
formation or to monitor the value of safer intravenous iron preparations in patients with blood loss 
anemia, particularly those undergoing EPO therapy. 

The hydrophobic IFI protein of SEQ ID NO:308 or fragments thereof may be used to diagnose 
conditions associated with its induction. For example, the protein of SEQ ID NO:308 or fragments thereof 

20 may be useful in the diagnosis and treatment of tumors, viral infections, inflammation, or conditions 
associated with impaired immunity, anemia of chronic blood loss or chronic disease, hemochromatosis, 
and EPO therapy. Furthermore, this protein may be used for investigating the control of gene expression 
by IFNs and other cytokines, as well as hormones and growth factors, in normal and diseased cells. 

The protein of SEQ ID NO:308 or fragments thereof is useful to correct defects in in vivo models 

25 of disease such as autoimmune, inflammation, anemia, iron-overload and tumor models, by injecting the 
protein either intra peritoneally intravenously, subcutaneously or directly in the diseased tissue. 

In addition, the protein of SEQ ID NO:308 is structurally related to other proteins having 
homology and/or structural similarity with human p27 (Rasmussen, U.B., et al., 1993, Cancer Research 
53:4096-4101, the disclosure of which is incorporated herein by reference). Accordingly, the protein of 

30 brain, fetal brain, kidney, fetal kidney, or colon may be used to regulate the proliferation of EPO- 
dependent cells or the growth and development of erythroid and other hematopoietic lineages. 

The protein of SEQ ID NO:308 or fragments thereof, or polynucleotides encoding the protein of 
SEQ ID NO:308 or fragments thereof, may be used to treat or ameliorate anemia of chronic disease and 
chronic renal failure, polycythemia, cancer, AIDS, drug- and phlebotomy-induced anemias, 
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hemochromatosis, erythropoiesis mediated by EPO therapy, and other conditions associated with altered 
activity or levels of the protein of SEQ ID NO:308. 

In another embodiment, the present invention relates to methods for identifying agonists and 
antagonists/inhibitors using the protein of SEQ ID NO:308 or fragments thereof, and treating conditions 
5 with the identified compounds. In a still further aspect, the invention relates to diagnostic assays for 
detecting diseases associated with inappropriate levels or activity of the protein of SEQ ID NO:308. In 
still another embodiment of the invention relates to the use of the protein SEQ ID NO: 3 08, fragments 
therof or the DNA encoding the protein of SEQ ID NO:308 or fragments thereof to monitor the value of 
iron therapy in patients undergoing EPO therapy, or experiencing blood loss, or both. 

10 The DNA encoding the protein of SEQ ID NO:308 or fragments thereof is useful in diagnostic 

assays for conditions/diseases associated with abnormal expression of the protein of SEQ ID NO:308. 
The diagnostic assay is useful to distinguish between absence, presence, and excess expression of the 
protein of the invention and to monitor regulation of levels of the protein of the invention during 
therapeutic intervention. The DNA may also be incorporated into effective eukaryotic expression vectors 

15 and directly targeted to a specific tissue, organ, or cell population for use in gene therapy to treat the 
above mentioned conditions, including tumors and/or to correct disease- or genetic-induced defects in any 
of the above mentioned proteins including the protein of the invention. The DNA may also be used to 
design antisense sequences and ribozymes, which can be administered to modify gene expression in 
tumor and pathogen-infected cells and to influence expression of cytokines, hormones and growth factors. 

20 In vivo delivery of genetic constructs into subjects can be developed to the point of targeting specific cell 
types, such as tumor where expression of the protein of the invention may be affected or is modulating the 
expression and/or activity of other proteins such as cytokines, growth factors, their receptors and/or tumor 
antigens. It is also useful to detect unknown upstream sequences (e. g. promoters and regulatory 
elements) by standard techniques and for research into the control of gene expression by interferons and 

25 other cytokines, as well as growth and transcription factors in normal and diseased cells. Hybridization 
probes are useful to detect DNA encoding the protein of the invention (or closely related molecules) in 
biological samples, and for mapping the naturally occurring genomic sequence to a particular 
chromosome/chromosome region. The DNA may be used to generate and/or treat in vivo animal models 
of disease, including susceptibility or resistance to infection, tumors, autoimmune conditions, anemia and 

30 iron-overload, as well as tumor therapy, based on vaccine, knock-out and transgene technologies. 

Antibodies against the protein of SEQ ID NO:308 are useful for the diagnosis of conditions and 
disease associated with its expression and to quantify the protein of the invention (e. g. in assays to 
monitor patients during therapeutic intervention). Antibodies specific for the protein may include, but are 
not limited to, polyclonal, monoclonal, chimeric, single chain, Fab fragments produced by a Fab 
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expression library. Neutralizing antibodies are especially preferred for diagnostics and therapeutics. 
Diagnostic assays for the protein of SEQ ID NO:308 include methods utilizing the antibody and a label to 
detect the protein of the invention in human body fluids or extracts of cells or tissues. 

The protein of SEQ ID NO:308 and its catalytic or immunogenic fragments or oligopeptides 
5 thereof, can be used for screening therapeutic compounds in any variety of drug screening techniques 
including high throughput. Methods which may be used to quantitate the expression of the nucleotide or 
protein of the invention include, but are not limited to, polymerase chain reaction (PCR), RT-PCR, 
RNAse protection, Northern blotting, enzyme-linked immunosorbent asay (ELISA), radioimmunoassay 
(RIA), fluorescent activated cell sorting (FACS), immunoprecipitation, and chromatography. 

10 Accordingly, the present invention includes the use of the protein of SEQ ID NO:308, fragments 

comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 
acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition in an 
individual. For example, the condition may be cancer, including breast cancer, viral infection, bacterial 
infection, inflammation, autoimmune disorders, rheumatoid arthritis, Graves disease, systemic lupus 

15 erythematosus, autoimmune hepatitis, Wegener's granulomatosis, sarcoidosis, polyarthritis, pemphigus, 
pemphigoid, erythema multiform, Sjogren's syndrome, inflammatory bowel disease, multiple sclerosis, 
myasthenia gravis keratitis, scleritis, Type I diabetes, insulin-dependent diabetes mellitus, Lupus 
Nephritis, and allergic encephalomyelitis; proliferative disorders including various forms of cancer such 
as leukemias, lymphomas (Hodgkins and non-Hodgkins), sarcomas, melanomas, adenomas, carcinomas 

20 of solid tissue, hypoxic tumors, squamous cell carcinomas of the mouth, throat, larynx, and lung, 
genitourinary cancers such as cervical and bladder cancer, hematopoietic cancers, head and neck cancers, 
and nervous system cancers, benign lesions such as papillomas, atherosclerosis, angiogenesis; viral 
infections, in particular HCV and HIV infections, as well as other pathogen-induced infections (e. g. 
leishmania). 

25 In such embodiments, the protein of SEQ ID NO:308 , or a fragment thereof, is administered to 

an individual in whom it is desired to increase or decrease any of the activities of the protein of SEQ ID 
NO:308. The protein of SEQ ID NO:308 or fragment thereof may be administered directly to the 
individual or, alternatively, a nucleic acid encoding the protein of SEQ ID NO:308 or a fragment thereof 
may be administered to the individual. Alternatively, an agent which increases the activity of the protein 

30 of SEQ ID NO:308 may be administered to the individual. Such agents may be identified by contacting 
the protein of SEQ ID NO:308 or a cell or preparation containing the protein of SEQ ID NO:308 with a 
test agent and assaying whether the test agent increases the activity of the protein. For example, the test 
agent may be a chemical compound or a polypeptide or peptide. 
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Alternatively, the activity of the protein of SEQ ED NO:308 may be decreased by administering 
an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:308 may be identified by contacting the protein of SEQ ID NO:308 or a cell 
or preparation containing the protein of SEQ ID NO:308 with a test agent and assaying whether the test 
5 agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 
forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify the source of a sample as, for example, 

10 brain, kidney, liver, or cancerous prostate, or to distinguish between two or more possible sources of a 
sample on the basis of the level of the protein of SEQ ID NO:308 in the sample. For example, the protein 
of SEQ ID NO: 308 or fragments thereof may be used to generate antibodies using any techniques known 
to those skilled in the art, including those described therein. Such antibodies may then be used to identify 
tissues of unknown origin, for example, forensic samples, differentiated tumor tissue that has 

15 metastasized to foreign bodily sites, or to differentiate different tissue types in a tissue cross-section using 
immunochemistry. In such methods a sample is contacted with the antibody, which may be detectably 
labeled, under conditions which facilitate antibody binding. The level of antibody binding to the test 
sample is measured and compared to the level of binding to control cells frombrain, kidney, liver, or 
cancerous prostate or tissues other than brain, kidney, liver, or cancerous prostate to determine whether 

20 the test sample is from brain, kidney, liver, or cancerous prostate. Alternatively, the level of the protein 
of SEQ ID NO:308 in a test sample may be measured by determining the level of RNA encoding the 
protein of SEQ ID NO:308 in the test sample. RNA levels may be measured using nucleic acid arrays or 
using techniques such as in situ hybridization, Northern blots, dot blots or other technques familiar to 
those skilled in the art. If desired, an amplification reaction, such as a PCR reaction, may be performed 

25 on the nucleic acid sample prior to analysis. The level of RNA in the test sample is compared to RNA 
levels in control cells from brain, kidney, liver, or cancerous prostate or tissues other than brain, kidney, 
liver, or cancerous prostate to determine whether the test sample is from brain, kidney, liver, or cancerous 
prostate. 

In another embodiment, antibodies to the protein of the invention or part thereof may be used for 
30 detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:308, including using 
methods known to those skilled in the art. For example, an antibody against the protein of SEQ ID 
NO:308 or a fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A 
preparation containing cells expressing the protein of SEQ ID NO:308 is placed in contact with the 
antibody under conditions which facilitate binding to the antibody. The support is washed and then the 
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cells are released from the support by contacting the support with agents which cause the cells to 
dissociate from the antibody. 

In another embodiment of the present invention, the protein of SEQ ID NO:308 or a fragment 
thereof thereof may be used to diagnose disorders associated with altered expression of the protein of 
5 SEQ ID NO:308. In such techniques, the level of the protein of SEQ ID NO:308 in an ill individual is 
measured using techniques such as those described herein. The level of the protein of SEQ ID NO:308 in 
the ill individual is compared to the level in normal individuals to determine whether the individual has a 
level of the protein of SEQ ID NO:308 which is associated with disease. 

Protein of SEQ ID NOs:289 and 307 (internal designations 175-1 -3-0-E5-CS.cor and 187-39-0-0-kl2- 
10 CS) 

The protein of SEQ ID NO:289 is encoded by the cDNA of SEQ ID NO:48. Accordingly, it will 
be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:289 described 
throughout the present application also pertain to the polypeptide encoded by the human cDNA of clone 
175-1-3-0-E5-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic acid of 
15 SEQ ID NO:48 described throughout the present application also pertain to the human cDNA of clone 
175-1-3-0-E5-CS. 

The protein of the invention consists of 130 amino acids. From the amino acid alignments and 
the hydrophobicity plots, it has a predicted signal peptide sequence spanning residues 8-20 and four 
predicted transmembrane domains spanning residues 2-24, 42-61, 70-90 and 99-119. Accordingly, some 

20 embodiments of the present invention relate to polypeptides comprising the signal peptide and/or one or 
more of the transmembrane domains. 

The protein of SEQ ID NO:289 encoded by the cDNA of SEQ ID NO:48 is homologous to SEQ 
ID NO: 4199 from EP 1033401-A2 (the disclosure of which is incorporated herein by reference in its 
entirety), a human secreted protein. Another protein, SEQ ID NO:307, encoded by the cDNA of SEQ ID 

25 NO:66, is a polymorphic variant of the protein of SEQ ID NO:289, and shares all of the herein-described 
functions and uses. 

The present invention relates to a novel protein identified among the cDNAs from a library 
constructed from salivary gland, and to the use of the nucleic acid and amino acid sequences disclosed 
herein in the study, diagnosis, prevention, and treatment of disease. Tissue distribution analysis predicted 
30 by BLAST on databases shows that mRNA encoding this protein was found primarily in brain and fetal 
brain, with lower amounts in kidney, fetal kidney and colon. 

Interferons (IFNs) are a part of the group of intercellular messenger proteins known as cytokines. 
a-IFN is the product of a multigene family of at least 16 members, whereas b-EFN is the product of a 
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single gene, a- and p-IFNs are also known as type I IFNs. Type I IFNs are produced in a variety of cells 
types. Biosynthesis of type I IFNs is stimulated by viruses and other pathogens, and by various cytokines 
and growth factors. y-IFN, also known as type II IFN, is produced in T-cells and natural killer cells. 
Antigens to which the organism has been sensitized stimulate biosynthesis of type II IFN. Both a- and y- 
5 IFNs are immunomodulators and anti-inflammatory agents, activating macrophages, T-cells and natural 
killer cells. 

IFNs are part of the body's natural defense to viruses and tumors. They exert these defenses by 
affecting the function of the immune system and by direct action on pathogens and tumor cells. IFNs 
mediate these multiple effects in part by inducing the synthesis of many cellular proteins. Some 

10 interferon-inducible (IFI) genes are induced equally well by a-, P- and y-IFNs. Other IFI genes are 
preferentially induced by the type I or by the type II IFNs. The various proteins produced by IFI genes 
possess antitumor, antiviral and immunomodulatory functions. The expression of tumor antigens in 
cancer cells is increased by ot-IFN, and renders the cancer cells more susceptible to immune rejection. 
The IFI proteins synthesized in response to viral infections are known to inhibit viral functions such as 

15 cell penetration, uncoating, RNA and protein synthesis, assembly and release (Hardman JG et al 25 
(1996) The Pharmacological Basis of Therapeutics, McGraw-Hill, New York NY pp 1211-1214, the 
disclosure of which is incorporated herein by reference in its entirety). Type II IFN stimulates expression 
of major histocompatibility complex (MHC) proteins and is thus used in immune response enhancement. 
The protein of SEQ ID NO:289 is a small hydrophobic protein having chemical and structural 

20 homology to human interferon-inducible (IFI) protein isoforms 6-16 (97% identity), HIFI (44%), and p27 
(33%), as well as 130-51, the chimpanzee homolog of 6-16 - (97%). Thus, the protein of SEQ ID 
NO:289 and the nucleic acid encoding it are polymorphic variants of 6-16 or the gene encoding 6-16. 
The protein of SEQ ID NO:289, fragments thereof, or nucleic acids encoding the protein of SEQ ID 
NO:289 or fragments thereof may be used in the diagnosis, study, prevention and treatment of disease as 

25 described below. 

The IFI gene known as 6-16 encodes an mRNA, which is highly induced by type I IFNs in a 
variety of human cells (Kelly JM et al (1986) EMBO J 5:1601-1606, the disclosure of which is 
incorporated herein by reference in its entirety). After induction, 6-16 mRNA constitutes as much as 0. 1% 
of the total cellular mRNA. The 6-16 mRNA is present at only very low levels in the absence of type I 
30 IFN, and is only weakly induced by type II IFN. The 6-16 mRNA encodes a hydrophobic protein of 130 
amino acids. The first 20 to 23 amino acids comprise a putative signal peptide. Protein 6-16 has at least 
two predicted transmembrane regions culminating in a negatively charged C-terminus. 

The p27 gene encodes a protein with 41% amino acid sequence identity to the 6-16 protein. The 
p27 gene is expressed in some breast tumor cell lines and in a gastric cancer cell line. In other breast 
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tumor cell lines, in the HeLa cervical cancer cell line, and in fetal lung fibroblasts, p27 expression occurs 
only upon a -IFN induction. In one breast tumor cell line, p27 is independently induced by estradiol and 
by IFN (Rasmussen UB et al (1993) Cancer Res 53:4096-4101, the disclosure of which is incorporated 
herein by reference in its entirety). Expression of p27 was analyzed in 21 primary invasive breast 
5 carcinomas, 1 breast cancer bone metastasis, and 3 breast fibroadenomas. High levels of p27 were found 
in about one-half of the primary carcinomas and in the bone metastasis, but not in the fibroadenomas. 
These observations suggest that certain breast tumors may produce high levels of, or have increased 
sensitivity to, type I IFN as compared to other breast tumors (Rasmussen UB et al, supra). In addition, 
the p27 gene expressed at significant levels in normal tissues including colon, stomach and lung, but not 

10 expressed in placenta, kidney, liver or skin. (Rasmussen UB et al, supra). 

The small hydrophobic IFI gene products may contribute to viral resistance. A hepatitis-C virus 
(HCV)-induced gene, 130-51, was isolated from a cDNA library prepared from chimpanzee liver during 
the acute phase of the infection. The protein product of this gene has 97% identity to the human 6-16 
protein (Kato T et al (1992) Virology 190:856-860, the disclosure of which is incorporated herein by 

15 reference in its entirety). The authors of this paper suggest that HCV infection actively induces IFN 
expression, which in turn induces expression of IFI genes including 130-51. The IFI proteins synthesized 
in response to viral infections are known to inhibit viral functions such as penetration, uncoating, RNA or 
protein synthesis, assembly or release. The 130-51 protein may inhibit one or more of these functions in 
HCV. A particular virus may be inhibited in multiple functions by IFI proteins. In addition, the principle 

20 inhibitory effect exerted by IFI proteins differs among the virus families (Hardman JG, supra, p 121 1, the 
disclosure of which is incorporated herein by reference). 

The HIFI protein (PCT publication WO 9812223-A2, the disclosure of which is incorporated 
herein by reference in its entirety) is a human sequence identified among cDNAs from a library 
constructed from human neonatal kidney. Northern blot analysis using LIFESEQ™ database (Incyte 

25 Pharmaceuticas, Palo Alto, CA) shows that HIFI mRNA was found only in neonatal kidney. The HIFI 
protein consists of 104 amino acids and has 55%, 45%, and 46% amino acid sequence identity to p27, 6- 
16 and 130-51, respectively. 

The hydrophobic IFI proteins of the invention may provide the basis for clinical diagnosis of 
diseases associated with their induction. These proteins may be useful in the diagnosis and treatment of 

30 tumors, viral infections, inflammation, or conditions associated with impaired immunity. Furthermore, 
these proteins may be used for investigations of the control of gene expression by IFNs and other 
cytokines in normal and diseased cells. 

Based on the chemical and structural homology among the protein of SEQ ID NO:289 and the 
small hydrophobic IFI proteins from human and chimpanzee, it is believed that the protein of SEQ ID 
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NO:289 is synthesized when interferons are produced in infections, inflammation, autoimmune diseases 
etc. Interferons are produced in response to various cytokines and growth factors, in viral infections, 
inflammation, autoimmune diseases, and cancers. Accordingly, the protein of SEQ ED NO:289 or 
fragments thereof may be used in diagnosis and treatment of diseases such as, but not limited to, 
5 autoimmune disorders such as rheumatoid arthritis, Graves disease, systemic lupus erythematosus, 
autoimmune hepatitis, Wegener's granulomatosis, sarcoidosis, polyarthritis, pemphigus, pemphigoid, 
erythema multiform, Sjogren's syndrome, inflammatory bowel disease, multiple sclerosis, myasthenia 
gravis keratitis, scleritis, Type I diabetes, insulin-dependent diabetes mellitus, Lupus Nephritis, and 
allergic encephalomyelitis; proliferative disorders including various forms of cancer such as leukemias, 

10 lymphomas (Hodgkins and non-Hodgkins), sarcomas, melanomas, adenomas, carcinomas of solid tissue, 
hypoxic tumors, squamous cell carcinomas of the mouth, throat, larynx, and lung, genitourinary cancers 
such as cervical and bladder cancer, hematopoietic cancers, head and neck cancers, and nervous system 
cancers, benign lesions such as papillomas, atherosclerosis, angiogenesis; viral infections, in particular 
HCV and HIV infections, as well as other pathogen-induced infections (e. g. leishmania). 

15 The protein of SEQ ID NO:289 or fragments thereof may also be used to treat conditions 

associated with inflammation or immune impairment (e. g. reumathoid and osteo arthritis and AIDS). 

Another embodiment of the present invention relates to the use of the protein of SEQ ID NO:289 
or fragments thereof to treat and/or prevent the ill-effect of bacterial infection during pregnancy in 
mammals, such as spontaneous abortion and maternal death. In a preferred embodiment, the protein of 

20 the invention may be used to counteract the effects of the bacterial endotoxin lipopolysaccharide (LPS). 
The methods for using such compositions is described in Dziegielewska and Andersen, Biol. Neonate, 
74:372-5 (1998), the disclosure of which is incorporated herein by reference in its entirety. 

Furthermore, the protein of SEQ ID NO:289 or fragments thereof are useful as a reagent" for 
analyzing the control of gene expression by interferons and other cytokines in both normal and diseased 

25 cells. The protein of the SEQ ID NO:289 or fragments thereof may be used to identify specific molecules 
with which it binds such as agonists, antagonists or inhibitors. 

Another embodiment of the present invention relates to methods of using the protein of SEQ ID 
NO:289 or fragments thereof to identify and/or quantify cytokines of the interferon family as well as other 
cytokines such as IL-10 and tumor antigens, which may interact with the protein of the invention. 

30 The protein of SEQ ID NO:289 or fragments thereof may also be included in pharmaceutical 

preparations for treating cancer or prevention/treatment of other diseases associated with changes in 
expression of the protein of the invention. In another embodiment of the present invention, the protein of 
SEQ ID NO:289 or fragments thereof is used to inhibit and/or modulate the effect of cytokines and 
related molecule such as 11-2, TNF alpha, CTLA4, CD28, and others, by preventing the binding of the 
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endogenous cytokine to their natural receptors, thereby blocking cell proliferation or inhibitory signals 
generated by the ligand-receptor binding event. 

The protein of SEQ ID NO:289 or fragments thereof is useful to correct defects in in vivo models 
of disease such as autoimmune, inflammation and tumor models, by injecting the protein either intra 
5 peritoneally intravenously, subcutaneously or directly into the diseased tissue. 

The DNA encoding the protein of SEQ ID NO:289 or fragments thereof is useful in diagnostic 
assays for conditions/diseases associated with expression of the protein of the invention. The diagnostic 
assay is useful to distinguish between absence, presence, and excess expression of the protein of the 
invention and to monitor regulation of levels of the protein of the invention during therapeutic 

10 intervention. The DNA may also be incorporated into effective eukaryotic expression vectors and 
directly targeted to a specific tissue, organ, or cell population for use in gene therapy to treat the above 
mentioned conditions, including tumors and/or to correct disease- or genetic-induced defects in any of the 
above mentioned proteins including the protein of the invention. The DNA may also be used to design 
antisense sequences and ribozymes, which can be administered to modify gene expression in tumor and 

15 pathogen-infected cells and to influence expression of cytokines and growth factors. In vivo delivery of 
genetic constructs into subjects can be developed to the point of targeting specific cell types, such as 
tumor where expression of the protein of the invention may be affected or is modulating the expression 
and/or activity of other proteins such as cytokines, growth factors, their receptors and/or tumor antigens. 
It is also useful to detect unknown upstream sequences (e. g. promoters and regulatory elements) by 

20 standard techniques and for research into the control of gene expression by interferons and other 
cytokines, as well as growth and transcription factors in normal and diseased cells. Hybridization probes 
are useful to detect DNA encoding the protein of the invention (or closely related molecules) in biological 
samples, and for mapping the naturally occurring genomic sequence to a particular 
chromosome/chromosome region. The DNA may be used to generate and/or treat in vivo animal models 

25 of disease, including susceptibility or resistance to infection, inflammation, tumors and autoimmune 
conditions, as well as tumor therapy, based on vaccine, knock-out and transgene technologies. 

Antibodies against the protein of SEQ ED NO:289 or fragments thereof are useful for the 
diagnosis of conditions and diseases associated with its expression and to quantify the protein of the 
invention (e. g. in assays to monitor patients during therapeutic intervention). Antibodies specific for the 

30 protein may include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, Fab fragments 
produced by a Fab expression library. Neutralizing antibodies are especially preferred for diagnostics and 
therapeutics. Diagnostic assays for the protein of the invention include methods utilizing the antibody and 
a label to detect the protein of the invention in human body fluids or extracts of cells or tissues. 
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The protein of the invention and its catalytic or immunogenic fragments or oligopeptides thereof, 
can be used for screening therapeutic compounds in any variety of drug screening techniques including 
high throughput. Methods which may be used to quantitate the expression of the nucleotide or protein of 
the invention include, but are not limited to, polymerase chain reaction (PCR), RT-PCR, RNAse 
5 protection, Northern and western blotting, enzyme-linked immunosorbent asay (ELISA), 
radioimmunoassay (RIA), fluorescent activated cell sorting (FACS), immunoprecipitation, and 
chromatography. 

Accordingly, the present invention includes the use of the protein of SEQ ID NO:289, fragments 
comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 

10 acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition in an 
individual. For example, the condition may be cancer, including breast cancer, viral infection, bacterial 
infection, inflammation, .autoimmune disorders, rheumatoid arthritis, Graves disease, systemic lupus 
erythematosus, autoimmune hepatitis, Wegener's granulomatosis, sarcoidosis, polyarthritis, pemphigus, 
pemphigoid, erythema multiform, Sjogren's syndrome, inflammatory bowel disease, multiple sclerosis, 

15 myasthenia gravis keratitis, scleritis, Type I diabetes, insulin-dependent diabetes mellitus, Lupus 
Nephritis, and allergic encephalomyelitis; proliferative disorders including various forms of cancer such 
as leukemias, lymphomas (Hodgkins and non-Hodgkins), sarcomas, melanomas, adenomas, carcinomas 
of solid tissue, hypoxic tumors, squamous cell carcinomas of the mouth, throat, larynx, and lung, 
genitourinary cancers such as cervical and bladder cancer, hematopoietic cancers, head and neck cancers, 

20 and nervous system cancers, benign lesions such as papillomas, atherosclerosis, angiogenesis; viral 
infections, in particular HCV and HIV infections, as well as other pathogen-induced infections (e. g. 
leishmania). 

In such embodiments, the protein of SEQ ID NO:289, or a fragment thereof, is administered to 
an individual in whom it is desired to increase or decrease any of the activities of the protein of SEQ ID 

25 NO:289. The protein of SEQ ID NO:289 or fragment thereof may be administered directly to the 
individual or, alternatively, a nucleic acid encoding the protein of SEQ ID NO:289 or a fragment thereof 
may be administered to the individual. Alternatively, an agent which increases the activity of the protein 
of SEQ ID NO:289 may be administered to the individual. Such agents may be identified by contacting 
the protein of SEQ ID NO:289 or a cell or preparation containing the protein of SEQ ID NO:289 with a 

30 test agent and assaying whether the test agent increases the activity of the protein. For example, the test 
agent may be a chemical compound or a polypeptide or peptide. 

Alternatively, the activity of the protein of SEQ ID NO:289 may be decreased by administering 
an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:289 may be identified by contacting the protein of SEQ ID NO:289 or a cell or 
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preparation containing the protein of SEQ ID NO:289 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 
forming nucleic acid. 

5 In one embodiment, the invention relates to methods and compositions using the protein of the 

invention or part thereof as a marker protein to selectively identify the source of a sample as, for example, 
brain, fetal brain, kidney, fetal kidney, or colon, or to distinguish between two or more possible sources of 
a sample on the basis of the level of the protein of SEQ ID NO:289 in the sample. For example, the 
protein of SEQ ID NO:289 or fragments thereof may be used to generate antibodies using any techniques 

10 known to those skilled in the art, including those described therein. Such antibodies may then be used to 
identify tissues of unknown origin, for example, forensic samples, differentiated tumor tissue that has 
metastasized to foreign bodily sites, or to differentiate different tissue types in a tissue cross-section using 
immunochemistry. In such methods a sample is contacted with the antibody, which may be detectably 
labeled, under conditions which facilitate antibody binding. The level of antibody binding to the test 

15 sample is measured and compared to the level of binding to control cells from brain, fetal brain, kidney, 
fetal kidney, or colon or tissues other than brain, fetal brain, kidney, fetal kidney, or colon to determine 
whether the test sample is from brain, fetal brain, kidney, fetal kidney, or colon. Alternatively, the level 
of the protein of SEQ ID NO: 289 in a test sample may be measured by determining the level of RNA 
encoding the protein of SEQ ID NO:289 in the test sample. RNA levels may be measured using nucleic 

20 acid arrays or using techniques such as in situ hybridization, Northern blots, dot blots or other technques 
familiar to those skilled in the art. If desired, an amplification reaction, such as a PCR reaction, may be 
performed on the nucleic acid sample prior to analysis. The level of RNA in the test sample is compared 
to RNA levels in control cells from brain, fetal brain, kidney, fetal kidney, or colon or tissues other than 
brain, fetal brain, kidney, fetal kidney, or colon to determine whether the test sample is from brain, fetal 

25 brain, kidney, fetal kidney, or colon. 

In another embodiment, antibodies to the protein of the invention or part thereof may be used for 
detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:289, including using 
methods known to those skilled in the art. For example, an antibody against the protein of SEQ ID 
NO: 2 89 or a fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A 

30 prepartation containing cells expressing the protein of SEQ ID NO:289 is placed in contact with the 
antibody under conditions which facilitate binding to the antibody. The support is washed and then the 
cells are released from the support by contacting the support with agents which cause the cells to 
dissociate from the antibody. 
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In another embodiment of the present invention, the protein of SEQ ID NO:289 or a fragment 
thereof thereof may be used to diagnose disorders associated with altered expression of the protein of 
SEQ ID NO:289. In such techniques, the level of the protein of SEQ ID NO:289 in an ill individual is 
measured using techniques such as those described herein. The level of the protein of SEQ ID NO:289 in 
5 the ill individual is compared to the level in normal individuals to determine whether the individual has a 
level of the protein of SEQ ID NO:289 which is associated with disease. 

Protein of SEP ID NO:268 (internal designation 116-11 1-4-0-B3-CS) 

The protein of SEQ ID NO:268 is encoded by the cDNA of SEQ ID NO:27. Accordingly, it will 
be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:268 described 

10 throughout the present application also pertain to the polypeptide encoded by the human cDNA of clone 
116-1 1 1-4-0-B3-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic acid 
of SEQ ID NO:27 described throughout the present application also pertain to the human cDNA of clone 
1 16-1 1 1-4-0-B3-CS. The protein of the invention is found to be expressed in testis and lungs. 

The protein of SEQ ID NO:268 encoded by the extended cDNA SEQ ID NO: 27 is a splicing 

15 variant of XAGE-1, a member of the CT antigen family overexpressed in Ewing sarcoma (Liu, X. F., L. J. 
Helman, et al. (2000). Cancer Res 60(17): 4752-5, the disclosures of which are incorporated by reference 
herein in their entireties). In addition, the protein of SEQ ID NO:268 also shows strong homology at the 
COOH end with PAGE4, another member of the CT antigen family (Brinkmann, U., G. Vasmatzis, et al. 
(1999) Cancer Res 59(7): 1445-8, the disclosure of which is incorporated herein by reference in its 

20 entirety). 

The cDNA SEQ ID NO:27 is composed of 5 exons. Exon 1 lies between nucleotides 1-245, 
exon2 lies between nucleotides 246-370, exon 3 lies between nucleotides 371-512, exon 4 lies between 
nucleotides 513-639, and exon 5 lies between nucleotides 640-762 . Exons 2 to 5 of cDNA SEQ ID 
NO:27 are shared in part with XAGE-1. However, since the initiation codon of SEQ ED NO: 27 is 

25 located in intronl of XAGE-1, there is a frameshift in the alignment of the 2 molecules. Exon 1 of SEQ 
ID NO:27 lies between nucleotides 110-234 of XAGE-1, exon 2 of SEQ ID NO:27 lies between 
nucleotides 235-376 of XAGE-1, exon 3 of SEQ ID NO:27 lies between nucleotides 377-503 of XAGE- 
1, and exon 4 of SEQ ID NO:27 lies between nucleotides 504-526 of XAGE-1. 

XAGE-1 is overexpressed in sarcoma and alveolar rhabdomyosarcoma and is also highly 

30 expressed in normal testis (Liu, X. F., L. J. Helman, et al. (2000). Cancer Res 60(17): 4752-5, the 
disclosure of which is incorporated herein by reference in its entirety). In addition XAGE-1 share 
homology with PAGE-4 (Brinkmann, U., G. Vasmatzis, et al. (1999) Cancer Res 59(7): 1445-8, the 
disclosure of which is incorporated herein by reference in its entirety) at the COOH end. 
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CT antigens are a distinct class of differenctiation antigens that are expressed by cancers arising 
in nonessential normal tissues such as prostate, breast, and ovary (G. Vasmatzis et al., Proc. Natl. Acad. 
Sci. USA, 95: 300-304, 1998, the disclosure of which is incorporated herein by reference in its entirety) 
and that have a restricted pattern of expression in normal tissues. This class of antigens are presented on 
5 the surface of tumor cells and are recognized by cytolytic T cells, leading to lysis. The extent to which 
these antigens have been studied, has been via cytolytic T cell characterization studies, in vitro i.e., the 
study of the identification of the antigen by a particular cytolytic T cell ("CTL" hereafter) subset. The 
subset proliferates upon recognition of the presented tumor rejection antigen, and the cells presenting the 
antigen are lysed. Characterization studies have identified CTL clones which specifically lyse cells 

10 expressing the antigens. Examples of this work may be found in Levy et al., Adv. Cancer Res. 24: 1-59 
(1977); Boon et al., J. Exp. Med. 152: 1184-1193 (1980); Brunner et al., J. Immunol. 124: 1627-1634 
(1980) ; Maryanski et al, Eur. J. Immunol. 124: 1627-1634 (1980); Maryanski et al., Eur. J. Immunol. 12: 
406-412 (1982); Palladino et al., Cane. Res. 47: 5074-5079 (1987), the disclosures of which are 
incorporated herein by reference in their entireties. 

15 Some throughly studied CT antigens are MAGE, BAGE, GAGE and LAGE, others have been 

added including PAGE, XAGE, most of them located on chromosome X. Brinkmann et Al reported the 
identification of three new members of the GAGE/ PAGE family, termed XAGEs. XAGE-1 and XAGE-2 
are expressed in Ewing's sarcoma, rhabdomyosarcoma, a breast cancer, and a germ cell tumor. 

It is believed that the protein of SEQ ID NO:268 is a splicing variant of XAGE-1, a CT antigen 

20 overexpressed in Ewing sarcoma. 

Accordingly, the present invention includes the use of the protein of SEQ ID NO:268, fragments 
comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive ammo 
acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition, such as 
those listed above, associated with over or under expression of the protein of SEQ ID NO:268. In such 

25 ■ embodiments, the protein of SEQ ID NO:268, or a fragment thereof, is administered to an individual in 
whom it is desired to increase or decrease any of the activity of the protein of SEQ ID NO:268. The 
protein of SEQ ID NO:268 or fragment thereof may be administered directly to the individual or, 
alternatively, a nucleic acid encoding the protein of SEQ ID NO:268 or a fragment thereof may be 
administered to the individual. Alternatively, an agent which increases the activity of the protein of SEQ 

30 ID NO:268 may be administered to the individual. Such agents may be identified by contacting the 
protein of SEQ ED NO:268 or a cell or preparation containing the protein of SEQ ID NO:268 with a test 
agent and assaying whether the test agent increases the activity of the protein. For example, the test agent 
may be a chemical compound or a polypeptide or peptide. 
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Alternatively, the activity of the protein of SEQ ID NO:268 may be decreased by administering 
an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:268 may be identified by contacting the protein of SEQ ID NO:268 or a cell 
or preparation containing the protein of SEQ ID NO:268 with a test agent and assaying whether the test 
5 agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 
forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify tissues, preferably testis and lungs, or 

10 to distinguish between two or more possible sources of a tissue sample on the basis of the level of the 
protein of SEQ ID NO:268 in the sample. For example, the protein of SEQ ID NO:268 or fragments 
thereof may be used to generate antibodies using any techniques known to those skilled in the art, 
including those described therein. Such tissue-specific antibodies may then be used to identify tissues of 
unknown origin, for example, forensic samples,differentiated tumor tissue that has metastasized to foreign 

15 bodily sites, or to differentiate different tissue types in a tissue cross-section using immunochemistry. In 
such methods a tissue sample is contacted with the antibody, which may be detectably labeled, under 
conditions which facilitate antibody binding. The level of antibody binding to the test sample is measured 
and compared to the level of binding to control cells from testis or lungs or tissues other than testis or 
lungs to determine whether the test sample is from testis or lungs. Alternatively, the level of the protein 

20 of SEQ ID NO:268 in a test sample may be measured by determining the level of RNA encoding the 
protein of SEQ ID NO:268 in the test sample. RNA levels may be measured using nucleic acid arrays or 
using techniques such as in situ hybridization, Northern blots, dot blots or other technques familiar to 
those skilled in the art. If desired, an amplification reaction, such as a PCR reaction, may be performed 
on the nucleic acid sample prior to analysis. The level of RNA in the test sample is compared to RNA 

25 levels in control cells from testis or lungs or tissues other than testis or lungs to determine whether the test 
sample is from testis or lungs. 

In another embodiment, antibodies to the protein of the invention or part thereof may be used for 
detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:268, including Ewing 
sarcoma cells, rhabdomyosarcoma cells, breast cancer cells and germ cell tumor cells using methods 

30 known to those skilled in the art. For example, an antibody against the protein of SEQ ID NO:268 or a 
fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A prepartation 
containing cells expressing the protein of SEQ ID NO:268 is placed in contact with the antibody under 
conditions which facilitate binding to the antibody. The support is washed and then the cells are released 
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from the support by contacting the support with agents which cause the cells to dissociate from the 
antibody. 

In another embodiment of the present invention, the protein of SEQ ED NO:268 or a fragment 
thereof thereof may be used to diagnose disorders associated with altered expression of the protein of 
5 SEQ ID NO:268. In some embodiments, the protein of SEQ ED NO:268 or fragments thereof may be 
used to diagnose Ewing sarcoma, rhabdomyosarcoma, breast cancer or germ cell tumors. In such 
techniques, the level of the protein of SEQ ID NO:268 in an ill individual is measured using techniques 
such as those described herein. The level of the protein of SEQ ID NO:268 in the ill individual is 
compared to the level in normal individuals. An elevated level or decreased level of the protein of SEQ 
10 ID NO:268 relative to normal individuals suggests that the ill individual is suffering from a defect in 
intercellular communication or secretion. 

Another embodiment of the invention relates to compositions and methods using the protein of 
SEQ ID NO:268 or a fragment thereof as possible targets for vaccine-based therapies of cancer, including 
Ewing sarcoma, rhabdomyosarcoma, breast cancer or germ cell tumors. In such embodiments, an 
15 antibody against against the protein of SEQ ID NO:268 or a fragment thereof is administered to an 
individual suffering from cancer in an amount sufficient to ameliorate or eliminate the cancer. 

Protein of SEP ID NO:399 (internal designation (1 60-40- 1-0-H4-CS) 

The protein of SEQ ID NO:399 is encoded by the cDNA of SEQ ED NO: 158. Accordingly, it 
will be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:399 described 
20 throughout the present application also pertain to the polypeptide encoded by the human cDNA of clone 
160-40-1 -0-H4-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic acid 
of SEQ ID NO: 158 described throughout the present application also pertain to the human cDNA of clone 
1 60-40- 1-0-H4-CS. The protein of the invention is found to be expressed in testis and lungs. It is over 
represented in fetal brain. 

25 The protein of SEQ ID NO:399 encoded by the cDNA of SEQ ID NO: 158 is homologous to 

proteins of the Phosphatic Acid Phosphatase type 2 (PAP2) superfamily (Stukey J. and Carman G.M., 
Protein Sci 1997;6 :469-472, the disclosure of which is incorporated herein by reference in its entirety). 
Three variants of human PAP, i.e. PAP-alpha 2 (W79285) and its alternatively spliced form PAP-alpha 1 
(W79284), PAP-beta (W79286) and PAP-gamma (W79287) have been identified. The protein of SEQ ID 

30 NO:399 displays a pfam characteristic domain of the PAP2 superfamily from positions 19 to 175. 
Accordingly, one embodiment of the present invention is a polypeptide comprising amino acid residues 
19 to 175 of SEQ ID NO:399. Four membrane spanning domains are predicted from amino acid ositions 
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17 to, 47 to 67, 108 to 128, and 141 to 161. Accordingly, another embodiment of the present invention is 
a polypeptide comprising one or more of the foregoing membrane spanning domains. 

Phosphatidic acid phosphatase (PAP) (also referred to as phosphatidate phosphohydrolase) is 
known to be an important enzyme for glycerolipid biosynthesis. In particular, PAP catalyzes the 
5 conversion of phosphatidic acid (PA) into diacylglycerol (DAG). PA and DAG are lipids involved in 
signal transduction and in structural membrane-lipid biosynthesis in cells, thus they represent an 
important regulatory point in eukariotic phospholipid metabolism. DAG is a well-studied lipid second 
messenger which is essential for the activation of protein kinase C (Kent; Anal. Rev. Biochem. ; 64 : 315- 
343; 1995; whereas PA itself is also a lipid messenger implicated in various signaling pathways such as 
10 NADPH oxidase activation and calcium mobilization (English; Cell Signal.; 8:341-347 ;1996, the 
disclosure of which is incorporated herein by reference in its entirety). The regulation of PAP activity can 
therefore affect the balance of divergent signaling processes that the cell receives in terms of PA and 
DAG (Brindley et al.; Chem.Phys. Lipids 80:45-57 ; 1996, the disclosure of which is incorporated herein 
by reference in its entirety). 

15 PAP exists in at least two isoforms, one of which (PAP1) is presumed to be cytosolic and 

membrane associated and the other (PAP2) to be an integral membrane protein (Leung D.W., Tompkins 
C.K., White T. ; DNA Cell Biol.17 : 377-385 (1998)). The protein of the invention has 180 amino-acids 
and four predicted membrane-spanning segments, so is presumed to be an integral membrane protein. 

The protein of SEQ ID NO:399 is encoded by a cDNA that has homology to many forms of 

20 alternative splicing of PAP2 genes. For example, the protein of SEQ ID NO:399 has 29% homology 
with human phosphatidic acid phosphohydrolase type-2C protein. The protein of SEQ ID N0.399 also 
has 40% homology with human phosphatidic acid phosphatase 2B protein. In addition, the protein of 
SEQ ID NO:399 has 33% homology with human type 2 phosphatidic acid phosphatase alpha-2 protein. 
PAP2-alpha2 is one of the two isoforms with PAP2-alphal, presumed to be alternative splice variants 

25 from a single gene. 

Northern analysis has shown that PAP2-alpha mRNA expression was suppressed in several tumor 
tissues, indicating that PAP-2 may act as a tumor suppressor. The relationship of PAP and tumor 
suppression is further evidenced in findings that PAP activity is lower in fibroblast cell lines transformed 
with either the ras or fps oncogene than in the parental rati cell line (Brindley et al ; Chem. Phys. Lipids 
30 80 : 45-57 ;1996, the disclosure of which is incorporated herein by reference in its entirety). As discussed 
above, a decrease in PAP activity in transformed cells correlates with a concomitant increase in PA 
concentration. Moreover, elevated PAP activity and lower levels of PA have been observed in contact- 
inhibited fibroblasts relative to proliferating and transformed fibroblasts (Brindley et al ; Chem. Phys. 
Lipids 80: 45-57; 1996, the disclosure of which is incorporated herein by reference in its entirety). 
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Therefore, the protein of SEQ ID NO:399 or fragments thereof may be used to decrease cell division and 
as such can provide a useful tool in treating cancer. Subsequent analysis of colon tumor tissue derived 
from four donors confirmed lower expression of PAP2-alpha than in matching normal colon tissue. 
Considering these data and previous demonstrations that certain transformed cell lines have lower PAP 
5 activity, human PAP cDNAs may be used for gene therapy for certain tumors (Leung D.W., Tompkins 
C.K., White T. ; DNA Cell Biol.17 : 377-385 (1998), the disclosure of which is incorporated herein by 
reference in its entirety). Accordingly, one embodiment of the present invention is the use of the protein 
of SEQ ID NO:399 or a fragment thereof as a tumor suppressor. For example, a nucleic acid expressing 
the protein of SEQ ID NO:399 or a fragment thereof may be introduced into an individual suffering from 
10 cancer in order to ameliorate or eliminate the cancer. In fact, nucleic acids encoding human phosphatidic 
acid phosphatases have been used to regulate levels of lipid cellular mediators and in gene therapy of e.g. 
cancer (PCT publication WO98/46730, the disclosure of which is incorporated herein by reference in its 
entirety). 

In another embodiment of the present invention, the protein of SEQ ID NO:399 or a fragment 
15 thereof can be used to control the balance of lipid mediators of cellular activation and signal transduction. 
The protein of the invention has 33% homology with human phosphatidic acid phosphatase 2 A protein. 
PAP2A is an integral membrane glycoprotein at the cell surface that plays an active role in the hydrolysis 
and uptake of lipids from the extracellular space (Roberts RZ, Morris AJ; Bibchim Biophys Acta 2000 
Aug 24;1487(l):33-49, the disclosure of which is incorporated herein by reference in its entirety). 
20 Accordingly, the level or activity of the protein of SEQ ID NO:399 may be modulated to influence the 
rate or extent of hydrolysis and uptake of lipids from the extracellular space using methods such as those 
described herein. 

In another embodiment of the present invention, the protein of SEQ ID NO:399 can be used to 
counterbalance the inflammatory response. PA has been implicated in cytokine induced inflammatory 

25 responses (Bursten et al; Circ. Shok 44: 14-29, 1994; Abraham et al; J. Exp. Med. 181: 569-575, 1995; 
Rice et al; PNAS 91: 3857-3861, 1994; Leung et al; PNAS 92: 4813-4817, 1995, the disclosures of which 
are incorporated herein by reference in their entireties) and the modulation of numerous protein kinases 
involved in signal transduction (English et al ; Chem. Phys. Lipids 80: 117-132, 1996, the disclosure of 
which is incorporated herein by reference in its entirety). In addition, a nucleic acid encoding the protein 

30 of SEQ ID NO:399 or a fragment thereof may be used to counterbalance the inflammatory response from 
cytokine stimulation through degradation of excess amount of PA in cells or to treat or ameliorate 
inflammatory diseases. 

The gene encoding the protein of SEQ ID NO:399 or a fragment thereof can also be used in gene 
therapy for the treatment of obesity associated with diabetes. PAP activity is decreased in the livers and 
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hearts of the grossly obese and insulin resistant JCR:LA corpulent rat compared to the control lean 
phenotype (Brindley et al ; Chem. Phys. Lipids 80 : 45-57 ;1996, the disclosure of which is incorporated 
herein by reference in its entirety). The protein of the invention therefore can provide an important tool 
for the treatment of obesity associated with diabetes. 
5 Accordingly, the present invention includes the use of the protein of SEQ ID NO:399 , fragments 

comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 
acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition, such as 
those listed above, in an individual. In such embodiments, the protein of SEQ ID NO:399 , or a fragment 
thereof, is administered to an individual in whom it is desired to increase or decrease any of the activities 

10 of the protein of SEQ ID NO:399, including glycerolipid biosynthesis, conversion of phasphatidic acid 
into diacylglycerol, signal transduction, membrane-lipid biosynthesis, activation of protein kinase C, 
NADPH oxidase activation, calcium mobilization, cell division, production of diacylglycerol, 
monoacylglycerol, ceramide or sphingosine, modulation of the inflammatory response or 
dephosphorylation of a substrate such as lysophasphatidic acid, ceramide 1 -phosphate, or sphingosine 1- 

15 phosphate, or treatment or amelioration of obesity associated with diabetes. The protein of SEQ ED 
NO: 399 or fragment thereof may be administered directly to the individual or, alternatively, a nucleic acid 
encoding the protein of SEQ ID NO: 3 99 or a fragment thereof may be administered to the individual. 
Alternatively, an agent which increases the activity of the protein of SEQ ID NO: 3 99 may be 
administered to the individual. Such agents may be identified by contacting the protein of SEQ ID 

20 NO:399 or a cell or preparation containing the protein of SEQ ID NO:399 with a test agent and assaying 
whether the test agent increases the activity of the protein. For example, the test agent may be a chemical 
compound or a polypeptide or peptide. 

Alternatively, the activity of the protein of SEQ ID NO:399 may be decreased by administering 
an agent which interferes with such activity to an individual. Agents which interfere with the activity of 

25 the protein of SEQ ID NO:399 may be identified by contacting the protein of SEQ ID NO:399 or a cell 
or preparation containing the protein of SEQ ID NO:399 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 
forming nucleic acid. 

30 In one embodiment, the invention relates to methods and compositions using the protein of the 

invention or part thereof as a marker protein to selectively identify tissues, preferably brain, or to 
distinguish between two or more possible sources of a tissue sample on the basis of the level of the 
protein of SEQ ID NO:399 in the sample. For example, the protein of SEQ ID NO:399 or fragments 
thereof may be used to generate antibodies using any techniques known to those skilled in the art, 
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including those described therein. Such tissue-specific antibodies may then be used to identify tissues of 
unknown origin, for example, forensic samples,differentiated tumor tissue that has metastasized to foreign 
bodily sites, or to differentiate different tissue types in a tissue cross-section using immunochemistry. In 
such methods a tissue sample is contacted with the antibody, which may be detectably labeled, under 
5 conditions which facilitate antibody binding. The level of antibody binding to the test sample is measured 
and compared to the level of binding to control cells from brain or tissues other than brain to determine 
whether the test sample is from brain. Alternatively, the level of the protein of SEQ ID NO:399 in a test 
sample may be measured by determining the level of RNA encoding the protein of SEQ ID NO:399 in the 
test sample. RNA levels may be measured using nucleic acid arrays or using techniques such as in situ 

10 hybridization, Northern blots, dot blots or other technques familiar to those skilled in the art. If desired, 
an amplification reaction, such as a PCR reaction, may be performed on the nucleic acid sample prior to 
analysis. The level of RNA in the test sample is compared to RNA levels in control cells from brain or 
tissues other than brain to determine whether the test sample is from brain. 

In another embodiment, antibodies to the protein of the invention or part thereof may be used for 

15 detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:399 , including using 
methods known to those skilled in the art. For example, an antibody against the protein of SEQ ID 
NO: 3 99 or a fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A 
prepartation containing cells expressing the protein of SEQ ID NO:399 is placed in contact with the 
antibody under conditions which facilitate binding to the antibody. The support is washed and then the 

20 cells are released from the support by contacting the support with agents which cause the cells to 
dissociate from the antibody. 

In another embodiment of the present invention, the protein of SEQ ID NO:399 or a fragment 
thereof thereof may be used to diagnose disorders associated with altered expression of the protein of 
SEQ ID NO:399. In some embodiments, the protein of SEQ ID NO:399 or fragments thereof may be 

25 used to diagnose cancer. In such techniques, the level of the protein of SEQ ID NO:399 in an ill 
individual is measured using techniques such as those described herein. The level of the protein of SEQ 
ED NO:399 in the ill individual is compared to the level in normal individuals. An elevated level or 
decreased level of the protein of SEQ ID NO:399 relative to normal individuals suggests that the ill 
individual may suffer from cancer or be predisposed to getting cancer in the future. 

30 In another embodiment, the present invention relates to methods of preparing a PAP protein of 

SEQ ID NO:399 comprising the steps of (i) transforming a host cell with an expression vector comprising 
a polynucleotide encoding SEQ ID NO:399, (ii) culturing the transformed host cells which express the 
protein and (iii) isolating the protein. The present invention also relates to a method of dephosphorylating 
a substrate comprising contacting the substrate with an effective amount of isolated protein of SEQ ID 
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NO:399 or a fragment thereof such that the protein catalyzes the dephosphorylation of the substrate. It is 
further provided that this method occurs in vitro, and comprises a step of isolating the dephosphorylated 
substrate. Additionally, the method can occur in vivo, and is effected by the administration of the protein 
of the invention (or part of it) to a mammal in need thereof. 

5 Protein of SEQ ID NOs:258 and 262 (internal designations 1 10-007-1 -0-C7-CS, 1 16-055-1-0-A3-CS): 

The protein of SEQ ID NO:258 is encoded by the cDNA of SEQ ID NO: 17. Accordingly, it will 
be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:258 described 
throughout the present application also pertain to the polypeptide encoded by the human cDNA of clone 
1 10-007-1 -0-C7-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic acid 

10 of SEQ ID NO: 17 described throughout the present application also pertain to the human cDNA of clone 
110-007-1-0-C7-CS. The protein of SEQ ID NO:258 shows homologies to two high affinity IgE 
receptor-like proteins (IGER) with GENESEQP accession numbers W96745 and W41056, the disclosures 
of which are incorporated herein by reference in their entireties. The protein of SEQ ID NO:258 is 
expressed in liver and testis. The protein of SEQ ID NO:262, encoded by SEQ ID NO:21, is a variant of 

15 the protein of SEQ ID NO:258 and shares all the potential uses and functions described herein. This 
protein and cDNA share all of the characteristics and uses of the clone, and product thereof, 1 16-055-1-0- 
A3-CS). 

Like the two high affinity IgE receptor-like proteins, the protein of the invention contains four 
transmembrane spanning domains of 20 amino acids, between amino acids 53-73, 79-99, 121-141 and 

20 158-178, respectively. The protein of SEQ ID NO:258 crosses the plasma membrane four times forming 
two small extracellular loops and has both the N- or C- terminals in the cytoplasm. Moreover, the protein 
of the invention contains a signal peptide (cleavage site at position 21). 

The predicted structure of the protein of SEQ ID NO:258 demonstrates the relationship of this 
protein to FcsRip and CDC20 antigen and provides evidence for a family of 4-transmembrane spanning 

25 proteins. The conservation of amino acids between all three proteins is highest in the four transmembrane 
domains. While greater divergence exists in the hydrophilic amino and carboxyl termini, several amino 
acids within these regions are conserved such as the presence of 4 prolines in the amino terminus of all 
three proteins. In addition, two cysteine residues (position 147 and 156) are present in the second 
extracellular domain between TM3 and TM4. This suggests that inter- or intra-molecular di-sulfite bonds 

30 in this domain are present in all three proteins. 

FcsRI, is part of a tetrameric receptor complex consisting of an a chain, a P chain and two y 
chains (Kinet et al. Proc Natl. Acad. Sci. USA, 15: 6483-6487 (1988), the disclosure of which is 
incorporated herein by reference in its entirety). Together, they mediate interaction with IgE-bound 
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antigens leading to dramatic cellular responses, such as the massive degranulations of mast cells. The P 
subunit is a 4-transmembrane protein with both the amino and carboxyl termini residing in the cytoplasm. 

Chromosome mapping localized cDNA of SEQ ID NO: 17 to chromosome 1 lql2, the location of 
the CD20 gene. However, the murine FceRI(3 and Ly-44 (the murine equivalent of CD20) are both 
5 located in the same position in mouse in chromosome 19 (Teder, T.F. et al., J. Immunol. 141:4388-4394 
(1988), Clark E.A. and Lane, J.L. Annu. Rev. Immunol. 9:97-127 (1991), the disclosures of which are 
incorporated herein by reference in their entireties). Therefore, the three genes are believed to have been 
originated and evolved from the same locus, further supporting the proposition that they are members of 
the same family of related proteins. 

10 On the basis of the foregoing information, it is believed that the protein of SEQ ID NO:258 is a 

high affinity immunoglobulin E receptor-like protein. 

Atopic diseases, which include allergy, asthma, atopic dermatitis (or eczema) and allergic rhinitis 
are generally defined as a disorder of Immunoglobulin E (IgE) responses to common antigens, such as 
pollen or house dust mites. It is frequently detected by either elevated total serum IgE levels, antigen 

15 specific IgE response or positive skin tests to common allergens. In principle, atopy can result from 
dysregulation of any part of the pathway which begins with antigen exposure and IgE response to the 
interaction of IgE with its receptor on mast cell, the high affinity Fc receptor FceRI, and the subsenquent 
cellular activation mediated by that ligand-receptor engagement (Ravetch, Nature Genetics, 7: 117-118 
(1994), the disclosure of which is incorporated herein by reference in its entirety). 

20 Accordingly, the protein of SEQ ID NO:258 or fragments comprising at least 5, 8, 10, 12, 15, 20, 

25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino acids thereof, or fragments having a 
desired biological activity may administered to an individual in whom it is desired to increase or decrease 
the activity of the protein of SEQ ID NO:258. In particular, the protein of SEQ ID NO:258 or fragment 
thereof may be administered to an individual in whom it is desired to regulate the extent of the IgE 

25 response. In such methods, the protein of SEQ ID NO:258 or fragment thereof may be administered 
directly to the individual or, alternatively, a nucleic acid encoding the protein of SEQ ID NO:258 or a 
fragment thereof may be administered to the individual. Alternatively, an agent which increases the 
activity of the protein of SEQ ID NO:258 may be administered to the individual. Such agents may be 
identified by contacting the protein of SEQ ID NO:258 or a cell or preparation containing the protein of 

30 SEQ ID NO:258 with a test agent and assaying whether the test agent increases the activity of the protein. 
For example, the test agent may be a chemical compound or a polypeptide or peptide. 

The protein of SEQ ED NO:258 or fragments thereof may also be used to identify genes or 
polypeptides that may play a role in IgE responses or atopic disease. In particular, binding partners for 
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the protein of SEQ ID NO:258 or the genes encoding such binding partners may be identified using a 
variety of techniques familiar to those skilled in the art, including the techniques described herein. 

The protein of SEQ ID NO:258 or the polynucleotide encoding the protein of SEQ ID NO:258 
may also be used to diagnose hereditary atopy. In particular, the level of the protein of SEQ ID NO:258 
5 may be determined in a test individual using methods such as those described herein and compared to the 
levels of normal individuals and individuals suffering from hereditary atopy to determine whether the test 
individual is suffering from or at risk of suffering hereditary atopy. Alternatively, a nucleic acid sample 
may be obtained from a test individual and analyzed to determine whether it contains a level of RNA 
encoding the protein of SEQ ID NO:258 which is associated with hereditary atopy or a mutation in the 

10 gene encoding the protein of SEQ ID NO:258 which is associated with hereditary atopy. For example, a 
nucleic acid sample from the test individual may be contacted with a nucleic acid probe comprising the 
nucleic acid encoding the protein of SEQ ID NO:258 or a fragment thereof to determine the RNA level or 
whether the individual has a mutation associated with hereditary atopy. The probe may be either DNA, 
including cDNA or genomic DNA, or the probe may be RNA. Any of the methods familiar to those 

15 skilled in the art may be used in these diagnostic methods, including the methods described herein. For 
example, the presence of a mutation associated with hereditary atopy can be determined using methods 
generally known in the art, such as but not limited to PCR, sequencing or mini sequencing as described in 
the method of Yamamoto et al. (Biochem. Biophys. Res. Comm., 182:507 (1992), the disclosure of which 
is incorporated by reference herein in its entirety). 

20 The protein of SEQ ID NO:258 can also be used to characterize the induction of expression of 

FcsRI and the particular function of FcsRip. As such, the protein of the invention can be useful in, for 
example, the design of drugs that block or inhibit induction or activity of FcsRI, thereby treating atopic 
diseases. In particular, test agents which block or inhibit induction or activity may be identified using the 
methods described herein. 

25 In an other embodiment, the protein of SEQ ID NO:258 can be employed in the preparation of 

antibodies, such as monoclonal antibodies, according to methods known in the art, including those 
described herein. The antibodies can be used to block or mimic ligand binding to the receptor comprising 
the protein of the invention or other receptors, such as but not limited to FcsRI. The antibodies can also 
be used to isolate the protein of SEQ ID NO:258 or cells which express the protein of SEQ ID NO:258 

30 using methods such as those described herein. For example, the antibodies may be used to measure the 
presence of cells containing the protein of SEQ ED NO:258 (including but not limited to hematopoietic 
cells) in a sample. For example, the method comprises contacting the sample with the antibody under 
conditions sufficient for the antibody to bind to the protein of SEQ ID NO:258 and detecting the presence 
of bound antibody using methods known in the art, including those described herein. 
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In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify tissues, preferably liver and testis, or 
to distinguish between two or more possible sources of a tissue sample on the basis of the level of the 
protein of SEQ ID NO: 25 8 in the sample. For example, the protein of SEQ ID NO:258 or fragments 
5 thereof may be used to generate antibodies using any techniques known to those skilled in the art, 
including those described therein. Such tissue-specific antibodies may then be used to identify tissues of 
unknown origin, for example, forensic samples,differentiated tumor tissue that has metastasized to foreign 
bodily sites, or to differentiate different tissue types in a tissue cross-section using immunochemistry. In 
such methods a tissue sample is contacted with the antibody, which may be detectably labeled, under 

10 conditions which facilitate antibody binding. The level of antibody binding to the test sample is measured 
and compared to the level of binding to control cells from liver or testis or tissues other than liver or testis 
to determine whether the test sample is from liver or testis. Alternatively, the level of the protein of SEQ 
ID NO:258 in a test sample may be measured by determining the level of RNA encoding the protein of 
SEQ ID NO:258 in the test sample. RNA levels may be measured using nucleic acid arrays or using 

15 techniques such as in situ hybridization, Northern blots, dot blots or other technques familiar to those 
skilled in the art. If desired, an amplification reaction, such as a PCR reaction, may be performed on the 
nucleic acid sample prior to analysis. The level of RNA in the test sample is compared to RNA levels in 
control cells from liver or testis or tissues other than liver or testis to determine whether the test sample is 
from liver or testis. 

20 Protein of SEP E)NO:279 (internal designation 160-58-3-0-H3-CS) 

The protein of SEQ ID NO:279 is encoded by the cDNA of SEQ ID NO:38. Accordingly, it will 
be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:279 described 
. throughout the present application also pertain to the polypeptide encoded by a nucleic acid included in 
clone 160-5 8-3 -0-H3-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic 
25 acid of SEQ ID NO:38 described throughout the present application also pertain to the nucleic acid 
included in clone 160-5 8-3 -0-H3-CS. 

The protein of SEQ ID NO:279 is encoded by a nucleic acid of 1330 nucleotides with an ORF 
between nt 198 to 998 yielding a 267 amino acid protein. The protein is a polymorphic variant of the 
sequence (SP:P01210) for proenkephalin A precursor (contains Met- and Leu- enkephalins). It has a 
30 signal peptide spanning 24 amino acid and 2 signature motifs for vertebrate endogenous opioid 
neuropeptides and endogenous opioid neuropeptide precursors. PSORT gives a predicted extracellular 
localization, including the cell wall (66.7%). The protein of SEQ ID NO:279 is primarily distributed the 
fetal brain, although expression in other tissues has also been shown (see below). The polymorphic 
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variation is found at amino acid position 75 (E->D, a conservative amino acid change). After signal 
peptide cleavage (amino acid 47 to 267; 220 amino acid), the protein still contains the polymorphic 
variation, which is now at amino acid position 29. This does not change any of the sequence of the 
different enkephalins that result after cleavage of this precursor protein. In addition, the polymorphism is 
5 25 amino acids away from the first cleavage site on the amino terminal side. This is unlikely to change 
the secondary structure of the actual cleavage site. 

PCT publication WO9606863-A1, the disclosure of which is incorporated herein by reference in 
its entirety, discloses a protein having high homology with the protein of SEQ ID NO:279. 

Accordingly, the protein of SEQ ID NO:279 is believed to be an enkephalin. Met- and Leu- 
10 enkephalins compete with and mimic the effects of opiate drugs. These two pentapeptides with potent 
opiate agonist activity in bioassay systems were originally identified by Hughes et al (Nature, 258, 577- 
580, 1975). The natural ligands for opiate receptors, which differ only in their COOH terminal amino 
acid, were named Met- and Leu-enkephalin to reflect their origin from the brain. Peptides containing 
these sequences are termed opiate or opioid peptides. Enkephalins are widely distributed throughout the 
15 central nervous system in enkephalinergic neuronal networks, and also exist in the peripheral nervous 
system, for example in autonomic ganglia. Data, largely circumstantial, suggest wide-ranging 
involvement of endogenous opioids for example in the modulation of pain perception, in mood and 
behaviour, learning and memory, responses to stress, diverse neuroendocrine functions, immune 
regulation and cardiovascular and respiratory function. 
20 Met-enkephalin enhances the immune reaction in patients with cancer or AIDS. It can bind opoid 

receptors present in peripheral inflamed tissues to mediate an analgesic effect. 

After exogenous administration of the different enkephalins, several immunologic functions are 
affected, including antibody production, NK cell activity against tumors and viral infections, macrophage 
and polymorphonuclear leukocyte functions, graft rejections, and mitogen-stimulated lymphocyte 
25 proliferation. The effects can be bi-directional, where low concentrations enhance, and high 
concentrations inhibit the same immune function. Thus, enkephalins are modulators of immune reactions. 

These opioid neuropeptides are released by post-translational proteolytic processing of precursor 
proteins. These multivalent precursor proteins (polyprotein) consist of a signal sequence followed by a 
conserved region of about 50 residues, a variable length region and the sequence of the various 
30 neuropeptides. The preproenkephalin A (gene PENK) is processed to produce the following peptides 
which include Met-enkephalin (6 copies, 2 of which are extended) and Leu-enkephalin: 

Signal peptide 1-24 

Peptide 100-104 Met-enkephalin 1 

Peptide 107-1 1 1 Met-enkephalin 2 
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Peptide 136-140 Met-enkephalin 3 

Peptide 186-193 Met-enkephalin-arg-gly-leu 

Peptide 210-214 Met-enkephalin 4 

Peptide 230-234 Leu-enkephalin 
5 Peptide 261-267 Met-enkephalin-arg-phe 

The conserved region in the N-termini of these precursors contains six cysteines that are probably 
involved in disulfide bonds. This region could also be important for the processing of the neuropeptides. 

The precursor protein does have the potential to be differentially cleaved into multiple extended 
enkephalin and non-enkephalin-containing peptides, the functions of which are largely unknown; 
10 however, in some cases it has been shown that extended enkephalin-containing peptides have enhanced 
opiate activity. Another peptide, enkelytin, is produced that exhibits anti-bacterial activity (see below). 

There is a growing body of evidence that proenkephalin exists largely independently of free 
enkephalin peptides in a number of tissues and cell types including astrocytes (Melner et al, EMBO J, 9, 
791-796, 1990; Spruce et al, EMBO J 9, 1787-1795, 1990, the disclosures of which are incoporated 
15 herein by reference in their entireties), and is released from these cells in an unprocessed form (Batter et 
al, Brain Res. 563, 28-32, 1991, the disclosure of which is incorporated herein by reference in its 
entirety). There is evidence in some cases that processing enzymes are co-released along with the 
unprocessed precursor which suggests that extracellular cleavage may occur (Vilijn et al, J. Neurochem. 
53, 1487-1493, 1989). Even if biological activity is signalled through binding of the small peptide 
20 products to cell surface receptors, the regulation of this activity may be mediated through the precursor, 
and it is also possible that the unprocessed precursor has an additional intracellular role of its own. 

This protein was originally described to be present in various brain regions, most notably in the 
striatum as well as in neuroendocrine tissues, the pituitary and adrenal gland. It is also expressed in a 
variety of immune cells, including ConA-stimulated CD4 Tlymphocytes, CD4 thymocytes, B 
25 lymphocytes, as well as T cell lines, macrophages and mast cells. Expression has been reported in the 
reproductive system, heart and many developing tissues during gestation and early postnatal period 
Because of this, it has been postulated that these peptides play a role in cell or tissue growth and 
differentiation. For example, endogenous enkephalins induced in thymocytes modulate their own 
expression and function to inhibit the proliferation of activated thymocytes. 
30 Enkephalin peptides are abundant in adrenal medulla and can be released by neurotransmitters 

specific for that tissue. Enkephalins have also been found to be abundant in human phaeochromocytoma, 
a tumour derived from the adrenal medulla. The RNA from this tumour contains a high level of 
enkephalin mRNA sequences as demonstrated by cell-free translation studies. 
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Enkephalins function as opiate receptors are classified as delta, kappa and mu. A study by Lord 
et al (Nature, 267, 495-499, 1977) compared the activity of morphine and enkephalins in bioassay 
systems, and found that enkephalins bound predominantly to delta receptors. Subsequent studies have 
revealed homology of these receptors to other receptor families, including the immunoglobulin 
5 superfamily member OBCAM (Schofield et al, EMBO J 8, 489-495, 1989, the disclosure of which is 
incorporated herein by reference in its entirety) and somatostatin receptors (PCT publication 
WO96/06863, the disclosure of which is incorporated herein by reference in its entirety). This would 
explain the reported opioid binding properties of the former. Because of the latter 5 s homology to opiate 
receptors, it would also be expected to bind opioid receptor ligands. The recognition of opioid peptides 
10 by other non-opiate related receptors implies that these peptides may exert other as yet unknown 
functions. 

Enkephalins are also involved in apoptosis. Apoptosis is the morphologically distinct process of 
controlled cell death which balances the process of cell production by mitosis. A molecular connection 
between control of cell production and cell elimination has now been established, including the roles of c- 

15 myc and p53 in the pathways mediating apoptotic cell death. It has been proposed that all mammalian 
cells may be programmed to die by default in the absence of continuous signalling from neighboring 
cells. However, the acquisition of a survival advantage which prevents a single cell from activating its 
suicide program in response to levels of genetic damage associated with common environmental insults 
could theoretically be an initiating event in oncogenesis since it would favor the persistence of potentially 

20 tumorigenic mutations. Alternatively, inappropriate activation of survival pathways might lead to 
overriding the intrinsic death program and promote tumorigenesis at early and late stages. A particularly 
potent oncogenic pathway would be one which both promoted and tolerated genetic damage and helped a 
cell overcome its need for extracellular survival signals. Approximately 50% of human tumors possess 
normal p53 function. Thus, additional pathways or molecules which inappropriately repress apoptosis in 

25 human tumours remain to be identified. Opioid-like molecules could be involved in such a pathway. 

There are published reports that pathways which include opioid-like molecules participate in 
regulating the equilibrium between cell death and survival. For example, morphine inhibits cell survival 
in the developing cerebellum (Hauser et al, Exp. Neurol, 130, 95-105, 1994, the disclosure of which is 
incorporated herein by reference in its entirety) and induces apoptosis in thymocytes (Fuchs and Pruett, J. 

30 Pharmacol. Exp. Ther. 266, 417-423, 1993, the disclosure of which is incorporated herein by reference in 
its entirety). 

In a series of experiments (PCT publication WO 96/06863), it has been found that proenkephalin 
and/or its proteolytic products act as extracellular and/or cell surface membrane bound factors which 
modulate cell survival in transformed cells a) upon deprivation of exogenous survival factors, and b) 
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following genotoxic injury and/or stress when exogenous survival factors are non-limiting. The 
receptor(s) to which these factor(s) bind, which are most likely to exist on the cell surface are related, or 
possibly identical, to one or more members of the opioid receptor family. 

Opioid-like receptor types or subtypes can mediate survival or death; receptor (s) whichmediate 
5 death appear to be coupled to those which mediate survival. Natural ligands for these receptors are likely 
to be products of the opioid precursor genes, although natural ligands could include cytokines which 
mimic their effect. Tumour cells are more sensitive to antagonism of opioid-like receptor-mediated 
survival, and to stimulation of opioid-like receptor-mediated death, than non-transformed cells. The 
induction of cell cycle arrest enhances the sensitivity of tumour cells to thesemanipulations. (Enhanced 

10 sensitivity of tumour cells to these manipulations is induced by their synchronisation within the cell cycle. 

Cytoplasmic proenkephalin and/or its proteolytic products act as general repressors of apoptosis. 
Agents which, if coupled to appropriate internalisation agents, would antagonise cytoplasmic 
proenkephalin would therefore be of use in the induction of apoptosis in non-transformed as well as 
transformed cells, particularly in combination with sublethal doses of known apoptosis-inducing agents. 

15 The repression of apoptosis mediated through cytoplasmic proenkephalin is activated at high cell 

density predominantly by nondiffusable factors. Inhibition of proenkephalin or its products as described 
above would therefore be potentiated if agents were used in combination for example with neutralising 
antibodies to integrins (such as the antibody 23C6- Bates et al., J. Cell Biol. 125 403-415, 1994) to reduce 
exogenous survival signaling and simulate low density. 

20 Proenkephalin targeted to the cell nucleus induces apoptotic death, which is inhibited by the 

overexpression of large T antigen and is at least partly mediated through p53. Tumors which retain 
wild-type p53 function are therefore a particular target for apoptosis induction by agents which increase 
the levels of proenkephalin, or its derivatives, within the nucleus or which mimic the function of nuclear 
proenkephalin or its derivatives. 

25 Accordingly, the protein of SEQ ID NO:279, fragments thereof, or nucleic acids encoding the 

protein of SEQ ID NO:279 may be used to modulate a biochemical pathway in which products of opioid 
peptide precursor genes participate. In some embodiments, antibodies or other agents which reduce the 
level or activity of the protein of SEQ ID NO:279 or fragments thereof may be used to induce apoptosis 
in cells. The agents preferably neutralize the protein of SEQ ID NO:279 or its proteolytic derivatives, 

30 increase the level of, activate or mimic nuclear proenkephalin, or act as an antagonist to receptors related 
or identical to the delta and kappa opioid receptors. In some embodiments, the agent may be a 
neutralizing monoclonal antibody against the protein of SEQ ID NO:279 or a fragment thereof. The 
agent may also be a fragment or allelic form of one of these antibodies. A cytoplasmic anchor, or a 
nuclear localization signal may also be included in the agent. In some embodiments, the agent is able to 
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modulate a biochemical pathway in a cell in which products of opioid peptide precursor genes participate 
in order to induce apoptosis. The agents can be used for the treatment of cancer or for inducing apoptosis 
in lens cells following a cataract operation. In some embodiments, the agents promote apoptosis of 
proliferating cells with less, or no, effect on normal mature cell types. The agents may be administered in 
5 combination with a genotoxic or cell cycle arrest agent. Alternatively, the agent may be complexed with 
a chemotherapeutic, irradiation or cell cycle arrest (synchronization agent). 

Accordingly, the invention provides a means of inducing apoptosis in cells which comprises 
modifying a biological pathway of a cell in which a product of an opioid precursor gene participates in 
such a way that apoptosis is induced. Modification of the pathway is suitably effected by adminstration 

10 of an appropriate agent. In particular, the present invention provides an agent for use in inducing 
apoptosis in cells, said agent comprising an agent able to neutralise proenkephalin or its proteolytic 
derivatives; an agent which increases the level of nuclear proenkephalin and/or its derivatives, or which 
activates or mimics them an agent which acts as an antagonist at receptor(s) related or identical to the 
delta opioid receptor, or an agent which acts as an agonist at receptor(s) related or identical to the kappa 

15 opioid receptor. 

A subset of such agents are agents able to neutralise proenkephalin or its proteolytic derivatives, 
or an agent which acts as an antagonist at receptor(s) related or identical to the delta opioid receptor, or an 
agent which acts as an agonist at receptor(s) related or identical to the kappa opioid receptor. 

In some embodiments, the agent may be administered to the cell surface whereupon the survival 

20 effects of extracellular and/or cell surface membrane bound proenkephalin or its proteolytic derivatives is 
neutralised causing the cell to become apoptotic. Alternatively, an agent able to neutralise proenkephalin 
or its proteolytic derivatives may be coupled to an internalisation peptide and a cytoplasmic anchor. Such 
an assembly will remain in the cytoplasm of the cell, antogonising cytoplasmic proenkephalin and/or its 
proteolytic products and thus neutralising the apoptosis repressor effect of these molecules. 

25 Enkephalins also have anti-bacterial activity. During processing of the proenkephalin-A, the 

maturation in the adrenal medullary chromaffin cell starts with the removal of the carboxy-terminal end 
(proenkephalin-A-derived peptide or PEAP209-239) (Y. Goumon, K. Lugardon, B. Kieffer et al. J. Biol. 
Chem. 273:29847-29856, 1998, the disclosure of which is incorporated herein by reference in its 
entirety). The peptide enkelytin was identified as corresponding to bisphosphorylated PEAP 2 o9-237> and 

30 possesses antibacterial activity including Staphylococcus aureus and other gram-positive bacteria such as 
Micrococcus luteus and Bacillus megaterium (0.2-0.4 range). There is no ability to affect gram- 
negative bacteria (E. coli strain D22, D31, 663 and T13773) growth, nor is there any hemolytic activity. 
The activity of this peptide is specific - shorter versions of the peptide (209-220, 224-237, 230-237, 233- 
237) or non-phosphorylated PEAP209-239 exhibited little to no bacterial growth inhibiting activity. 
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Bovine periarthritis abscess fluid contains different forms of PEAP (72-237/239; 80-237/239) as 
identified by immunoreactivity and confirmed by sequence analysis. These peptides have activity against 
M. luteus, but are less active than enkelytin (5 versus 0.2 nM). These PEAP constitute a pool of 
precursors which have to be processed, during infection, to provide active enkelytin. Presence of a PEAP 
5 at a molecular mass corresponding to that of PEAP 20 9-237 was detected as well. PEAPs (PEAP202-238 and 
PEAP206-237) have also been detected in wound fluids, including bovine post-caesarean abscess in the 
subcutaneous lining, and an abscess induced by subcutaneous injection of complete Freund's adjuvant. 
Therefore, these peptides are present in wound fluids along with other known antibacterial peptides 
(defensins, bactenecins). The concentrations were in a range similar to that found to be active in vitro 
10 (0.5-1 nM). The PEAPs have also been detected in secretions from human polymorphonuclear 
neutrophils. 

The PEAP209-230 and enkelytin are secreted from cultured chromaffin cells following 
stimulation. This suggests that these two peptides are co-released with catecholamines in stress situations 
and may therefore play an important role in defense mechanisms. 

15 Co-release of met-enkephalin and enkelytin would represent a unified neuroimmune protective 

response to stress situations that may be accompanied with infectious diseases. This would provide a 
highly beneficial survival strategy at the very begninning of proinflammatory processes. This protein 
would therefore play an important role in host defense against microbial infections, especially those 
involving gram positive bacteria. Due to their nonspecific activity on membranes, the antibacterial 

20 peptides possess cytotoxic activities and may not only play a role in antimicrobial defense, but also in 
inflammatory processes, possibly in wound repair. 

The protein of SEQ ID NO:279, peptides derived by cleavage thereof or fragments thereof could 
be used as antibacterial agents in creams/ointments/solutions, presoaked bandages, or dermal-type patches 
for external applications. Alternatively, the protein of SEQ ED NO:279, peptides derived by cleavage 

25 thereof, or fragments thereof may be* used in injections (intravenously, subcutaneously or intra- 
peritoneally). This is useful for wound repair, burn healing, post-operative recovery management. 

Alternatively, the protein of SEQ ID NO:279, peptides derived by cleavage thereof, or fragments 
thereof, may be incorporated into disinfectant solutions used for cleaning surfaces such as in the the house 
(kitchen, bathroom) or in the office (desktops, phones, computer keyboards and mouse). Other 

30 applications are as additives in mouthwash or handi-popup wipes. 

Altered levels of enkephalins may produce psychological disease. Konig et al (Nature, 383, 535- 
538, 1996, the disclosure of which is incorporated herein by reference in its entirety) used a genetic 
approach to study the role of the mammalian opioid system. They disrupted the pre-proenkephalin gene 
using homologous recombination in embryonic stem cells to generate enkephalin-deficient mice. Mutant 
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enk -/- animals are healthy, fertile, and care for their offspring, but display significant behavioral 
abnormalities. Mice with the enk -/- genotype are more anxious and males display increased offensive 
aggressiveness. Mutant animals show marked differences from controls in supraspinal, but not in spinal, 
responses to painful stimuli. These enk -/- mice do however exhibit normal stress-induced analgesia. 
5 Therefore, enkephalins modulate responses to painful stimuli. Thus, genetic factors may contribute 
significantly to the experience of pain. This study clearly indicates the importance of enkephalins in pain 
perception, anxiety and aggressiveness. 

Interestingly, the PENK gene is localized on 8q23-q24, the same locus on which are found genes 
related to epilepsy and spastic paraplegia, disorders related to brain dysfunction. 
10 Accordingly, the protein of SEQ ID NO:279 or fragments thereof may be used for the treatment 

of psychological disorders, especially those involving distortion in the perception of pain, aggressiveness, 
or anxiety. This would include drug addiction, different types of phobias, panic attacks, schizophrenia, 
bi-polar, anorexia nervosa, chronic pain disorders, post-traumatic events, post-operative pain 
management. 

15 Accordingly, the present invention includes the use of the protein of SEQ ID NO:279, fragments 

comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 
acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition in an 
individual. For example, the condition may be cancer, a condition resulting from increased or decreased 
cellular proliferation, bacterial infection, conditions resulting from abnormal immune responses, 

20 psychological disease or any of the conditions listed above. In such embodiments, the protein of SEQ ID 
NO:279, or a fragment thereof, is administered to an individual in whom it is desired to increase or 
decrease any of the activities of the protein of SEQ ID NO:279. The protein of SEQ ID NO:279 or 
fragment thereof may be administered directly to the individual or, alternatively, a nucleic acid encoding 
the protein of SEQ ID NO:279 or a fragment thereof may be administered to the individual. 

25 Alternatively, an agent which increases the activity of the protein of SEQ ID NO:279 may be 
administered to the individual. Such agents may be identified by contacting the protein of SEQ ID 
NO:279 or a cell or preparation containing the protein of SEQ ID NO:279 with a test agent and assaying 
whether the test agent increases the activity of the protein. For example, the test agent may be a chemical 
compound or a polypeptide or peptide. 

30 Alternatively, the activity of the protein of SEQ ID NO:279 may be decreased by administering 

an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:279 may be identified by contacting the protein of SEQ ID NO:279 or a cell or 
preparation containing the protein of SEQ ID NO:279 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
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polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 
forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify the source of a sample as, for example, 
5 fetal brain, or to distinguish between two or more possible sources of a sample on the basis of the level of 
the protein of SEQ ED NO:279 in the sample. For example, the protein of SEQ ID NO:279 or fragments 
thereof may be used to generate antibodies using any techniques known to those skilled in the art, 
including those described therein. Such antibodies may then be used to identify tissues of unknown 
origin, for example, forensic samples, differentiated tumor tissue that has metastasized to foreign bodily 

10 sites, or to differentiate different tissue types in a tissue cross-section using immunochemistry. In such 
methods a sample is contacted with the antibody, which may be detectably labeled, under conditions 
which facilitate antibody binding. The level of antibody binding to the test sample is measured and 
compared to the level of binding to control cells from fetal brain or tissues other than fetal brain to 
determine whether the test sample is from fetal brain. Alternatively, the level of the protein of SEQ ID 

15 NO:279 in a test sample may be measured by determining the level of RNA encoding the protein of SEQ 
ID NO:279 in the test sample. RNA levels may be measured using nucleic acid arrays or using 
techniques such as in situ hybridization, Northern blots, dot blots or other technques familiar to those 
skilled in the art. If desired, an amplification reaction, such as a PCR reaction, may be performed on the 
nucleic acid sample prior to analysis. The level of RNA in the test sample is compared to RNA levels in 

20 control cells from fetal brain or tissues other than fetal brain to determine whether the test sample is from 
fetal brain. 

In another embodiment, antibodies to the protein of the invention or part thereof may be used for 
detection, enrichment, or purification of cells expressing the protein of SEQ ID NO:279, including using 
methods known to those skilled in the art. For example, an antibody against the protein of SEQ ID 

25 NO:279 or a fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A 
preparation containing cells expressing the protein of SEQ ID NO:279 is placed in contact with the 
antibody under conditions which facilitate binding to the antibody. The support is washed and then the 
cells are released from the support by contacting the support with agents which cause the cells to 
dissociate from the antibody. 

30 In another embodiment of the present invention, the protein of SEQ ID NO:279 or a fragment 

thereof thereof may be used to diagnose disorders associated with altered expression of the protein of 
SEQ ID NO:279. In such techniques, the level of the protein of SEQ ID NO:279 in an ill individual is 
measured using techniques such as those described herein. The level of the protein of SEQ ID NO:279 in 
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the ill individual is compared to the level in normal individuals to determine whether the individual has a 
level of the protein of SEQ ID NO:279 which is associated with disease. 

Protein of SEP ID NO: 293 (internal designation 181-1 6-1 -0-G7-CS) 

The protein of SEQ ID NO: 293 has a high degree of homology with HSPC163 (Genbank 
5 accession number AF161512), the protein encoded by gene no: 93 (PCT/US99/17130) and the human 
cornichon protein TGAM77. SEQ ID NO: 293 is overexpressed in cancerous prostate, fetal brain and 
fetal kidney. 

The gene HSPC163 is one of three hundred cDNAs obtained from CD34+ hematopoietic stem / 
progenitor cell (HSPC) library (obtained from umbilical cord blood and adult bone marrow). HSPC163 

10 has also been in identified in five hematopoietic cell lines: NB4 (granulocytic), HL60 (granulocytic), 
U937 (monocytic), K562 (erythro-megakaryocytic), and Jurkat (T lymphocytic). These cell lines 
represent the distinct lineages of hematopoietic cells. 

The polypeptide of gene no: 93 has been determined to have two transmembrane domains and a 
short cytoplasmic tail. Based upon these characteristics, it is believed that the protein product of gene no: 

15 93 shares structural similarity to type Ilia membrane proteins. This gene is expressed primarily in 
activated T-cells and to a lesser extent in endometrial tumor, T cell helper II cells, microvascular 
endothelial cells, Raji cells treated with cyclohexamide and umbilical vein endothelial cells. The 
expression pattern of gene no: 93, indicates a role in regulating the proliferation, survival, differentiation, 
and/or activation of hematopoietic cell lineages, including blood stem cells. The gene product appears to 

20 be involved in the regulation of cytokine production, antigen presentation, and other immune processes, 
suggesting a usefulness in boosting the immune system. The translation product of this gene has high 
homology to the human TGAM77 and mouse cornichon proteins. 

TGAM77 was identified as a gene involved in early phase of T-cell activation in response to 
alloantigens. Twenty four hours after T-cell allostimulation, RNA expression of TGAM77 is 

25 significantly increased. TGAM77 has been designated as a T-cell growth associated molecule. TGAM77 
is a human homolog of cornichon (cni) protein of the fruit fly Drosophila. 

Cornichon was demonstrated to be involved in carefully orchestrated signaling events during 
Drosophila oogenesis establishing an asymmetric pattern in the oocyte as a prerequisite for correct 
embryogenesis. Cornichon signaling functions in concert with two other proteins. The function of all 

30 three genes in an EGF-like signaling pathway appears to direct the formation of a correctly polarized 
microtubule cytoskeleton, which is thought to be the basis for the correct spatial localization of other 
singaling molecules essential for oocyte polarization, asymmetric movement of the nucleus, and embryo 
differentiation. 
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The subject invention provides the amino acid sequence of SEQ ID NO: 293 and polynucleotide 
sequences encoding the amino acid sequence of SEQ ID NO: 293 . In one embodiment, the polypeptides 
of SEQ ID NO: 293 are interchanged with the corresponding polypeptides encoded by the human cDNA 
of clone 181-16-1-0-G7-CS. Also included in the invention are biologically active fragments of SEQ ID 
5 NO: 293 and polynucleotide sequences encoding these biologically active fragments. "Biologically 
active fragments" are defined as those peptide or polypeptide fragments of SEQ ID NO: 293 which have 
at least one of the biological functions of the full length protein (e.g., the ability to stimulate T-cell 
proliferation). 

The invention also provides variants of SEQ ID NO: 293 . These variants have at least about 

10 80%, more preferably at least about 90%, and most preferably at least about 95% amino acid sequence 
identity to the amino acid sequence of SEQ ID NO: 293. Variants according to the subject invention also 
have at least one functional or structural characteristic of SEQ ID NO: 293, such as the biological 
functions described above. The invention also provides biologically active fragments of the variant 
proteins. Unless otherwise indicated, the methods disclosed herein can be practiced utilizing SEQ ID 

15 NO: 293 or variants thereof. Likewise, the methods of the subject invention can be practiced using 
biologically fragments of SEQ ID NO: 293, or variants of said biologically active fragments. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 
SEQ ID NO: 293 . It is well within the skill of a person trained in the art to create these alternative DNA 
sequences which encode proteins having the same, or essentially the same, amino acid sequence. These 

20 variant DNA sequences are, thus, within the scope of the subject invention. As used herein, reference to 
"essentially the same" sequence refers to sequences that have amino acid substitutions, deletions, 
additions, or insertions that do not materially affect biological activity. Fragments retaining one or more 
characteristic biological activity of SEQ ID NO: 293 are also included in this definition. 

"Recombinant nucleotide variants" are alternate polynucleotides which encode a particular 

25 protein. They can be synthesized, for example, by making use of the "redundancy" in the genetic code. 
Various codon substitutions, such as the silent changes which produce specific restriction sites or codon 
usage-specific mutations, can be introduced to optimize cloning into a plasmid or viral vector or 
expression in a particular prokaryotic or eukaryotic host system, respectively. 

SEQ ID NO: 293 protein, and variants thereof, can be used to produce antibodies according to 

30 methods well known in the art. The antibodies can be monoclonal or polyclonal. Antibodies can also be 
synthesized against fragments of SEQ ID NO: 293 as well as variants of SEQ ID NO: 293 according to 
known methods. The subject invention also provides antibodies which specifically bind to biologically 
active fragments of SEQ ID NO: 293 or biologically active fragments of SEQ ID NO: 293 variants. 
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The subject invention also provides for immunoassays which are used to screen for, monitor, or 
diagnose prostate cancer. Methods of screening for, diagnosing, identifying, or monitoring the course of 
prostate cancer are well known to those skilled in the art. In this aspect of the invention, immunoassays 
are provided which contact a biological sample (e.g., blood, serum, tissue, or biopsied tissue sample) with 
5 antibodies which specifically bind to SEQ ED NO: 293 , immunogenic fragments of SEQ ID NO: 293 , or 
biologically active fragments of SEQ ID NO: 293 . Immunocomplexes formed in the contacting step are 
then detected using an appropriately labeled detection reagent. The levels of SEQ ID NO: 293 expressed 
in the tested biological samples are compared to control/normal levels typically observed in the 
population. 

10 Alternatively, methods which screen for, monitor, or diagnose prostate cancer may be practiced 

with SEQ ID NO: 293 , or fragments of SEQ ID NO: 293 , as well as nucleic acids encoding SEQ ID NO: 
293 , or fragments of SEQ ID NO: 293 . In one embodiment, the polypeptide may be used as a 
standard/control immunoassays described above. In another embodiment, the nucleic acids encoding 
SEQ ID NO: 293 , or fragments of SEQ ID NO: 293 are used in hybridization assays, well known to the 

15 skilled artisan, to identify biological samples (e.g., blood, serum, tissue, or biopsied tissue sample) which 
contain SEQ ID NO: 293 . The levels of SEQ ID NO: 293 expressed in the tested biological samples are 
compared to control/normal levels typically observed in the population. 

In another embodiment, SEQ ID NO: 293 , and polynucleotide sequences encoding the amino 
acid sequence of SEQ ID NO: 293 can be used to identify or diagnose immune disorders involving 

20 activated T-cells using standard hybridization assays. 

Another aspect of the invention provides methods of immunostimulating a mammal. In this 
aspect of the invention, SEQ ID NO: 293 , and/or polynucleotide sequences encoding the amino acid 
sequence of SEQ ID NO: 293 , are introduced into T-cells according to well known methods. T-cells are, 
then activated by stimulation with antigen to induce the immune system of the mammal. 

25 In another embodiment, autologous T-cells are obtained from an individual. SEQ ID NO: 293 , 

biologically active fragments thereof, and/or polynucleotide sequences encoding the amino acid sequence, 
or biologically active fragments, of SEQ ID NO: 293 , are introduced into these autologous T-cells 
according to well known methods. The T-cells are expanded and reintroduced into the individual from 
which the T-cells were obtained. See, for example U.S. Patent Nos. 5,192,537 and 5,766,920 , hereby 

30 incorporated by reference in their entirety. 

In another embodiment of the subject invention, polynucleotides and polypeptides encoding SEQ 
ID NO: 293 , can be used to expand stem cells, committed progenitors of various blood lineages, and in 
the differentiation and/or proliferation of various cell types. In this aspect of the invention, 
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polynucleotides and polypeptides encoding SEQ ID NO: 293 are introduced into the cells and the cells 
cultured. These methods may be practiced according to methods well known to the routineer. 

Protein of SEQ ID NO:316 (internal designation 1 88-45-1 -0-D9-CS) 

The protein of SEQ ID NO:316 is encoded by the cDNA of SEQ ID NO:75. Accordingly, it will 
5 be appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:316 described 
throughout the present application also pertain to the polypeptide encoded by a nucleic acid included in 
clone 1 88-45-1 -0-D9-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic 
acid of SEQ ID NO:75 described throughout the present application also pertain to the nucleic acid 
included in clone 1 88-45-1 -0-D9-CS. 
10 The protein of SEQ ID NO:316 is expressed in brain and contains three membrane-spanning 

segments located between amino acid positions 6 and 26, 73 and 93, or 139 and 159 and a signal peptide 
comprising the sequence FAAFCYMLSLVLC/AA. Accordingly, one embodiment of the present 
invention is a polypeptide comprising one or more of the membrane-spanning segments, and/or the signal 
peptide. 

15 The protein of SEQ ID NO:3 16 is a member of the cornichon protein family. It has 48% identity 

with the Drosophila melanogaster cornichon protein as well as 67% identity with the Human Cornichon 
homolog TGAM77 (Genbank accession No. AF 104398, the disclosure of which is incorporated herein by 
reference in its entirety), 67% identity with hCornichon, a bone marrow secreted protein (PCT publication 
WO/9933979, the disclosure of which is incorporated herein by reference in its entirety), 67% identity 

20 with a human secreted protein encoded by gene 24 (PCT publication WO/9910363, the disclosure of 
which is incorporated herein by reference in its entirety) and 67% identity with the protein product of the 
mouse cnih gene. Howeve?, this protein has higher homology, 81% identity, to the mouse cornichon-like 
protein (Genbank accession No. AB006191, the disclosure of which is incorporated herein by reference in 
its entirety), which is the product of the mouse cnil gene. Finally, the protein of SEQ ED NO:316 has a 

25 high level of identity with human secreted protein encoded by gene 95 (GSP:Y76218, PCT publication 
WO/9958660, the disclosure of which is incorporated herein by reference in its entirety) and is likely a 
polymorphic varient of gene 95. The high degree of sequence conservation between the members of this 
family indicates that they are under strong selective pressure and are likely involved in important cellular 
functions. 

30 The Drosophila cornichon (cni) gene product is involved in signaling processes necessary for 

both anterior-posterior and dorsal-ventral pattern formation during Drosophila embyrogenesis (Cell, 
1995, 81:967-978). Mutations in cornichon prevent the formation of a correctly polarized microtubule 
cytoskeleton in the oocyte. Cni signaling functions in concert with two other proteins. Gurken, which is a 
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protein secreted from the oocyte containing a single epidermal growth factor (EGF) motif most similar in 
structure to vertebrate TGFot, is considered to be the ligand of the Drosophila epidermal growth factor 
receptor (DER) homolog torpedo, which is expressed by the follicular epithelium. The function of all 
three genes in an EGF-like signaling pathway appears to direct the formation of a correctly polarized 
5 microtubule cytoskeleton, which is thought to be the basis for the correct spatial localization of other 
signaling molecules essential for oocyte polarization, asymmetric movement of the nucleus, and embryo 
differentiation. TGAM77, one of the human homologs of cornichon, is differently expressed in 
alloactivated T-cells (Bioch. Biophys. Acta 1999, 1449:203-210, the disclosure of which is incorporated 
herein by reference in its entirety). Since there is a well-known involvement of the microtubule 

10 cytoskeleton in spatial polarization of signaling events in T-cell activation, it is thought that TGAM77 
may function in a protein-tyrosine kinase pathway required for the vectorial localization of signaling 
molecules in T-cell activation. 

The protein of SEQ ID NO:316 is found in brain tissue, and gene 95 (GSP:Y76218, PCT 
publication WO/9958660, the disclosure of which is incorporated herein by reference) is expressed in 

15 infant brain tissue, endometrial tumor tissue and fontal cortex tissue. ESTs matching this gene are also 
found in lung tissue, germ cell tumors and skin melanomas. This is similar to the expression pattern of 
the murine cnil gene, which is found in 6.5-day whole embryos, 11.5-day limb bud, 13.5-day whole 
embryo, adult lung and brain (Dev. Genes Evol., 1999, 209:120-125, the disclosure of which is 
incorporated herein by reference in its entirety). 

20 Polynucleotides encoding the protein of SEQ ID NO:316 or fragments thereof and polypeptides 

comprising the protein of SEQ ID NO:316 or fragments thereof are useful as reagents for differential 
identification of the tissue(s) or cell type(s) present in a biological sample and for diagnosis of -diseases 
and conditions which include, but are not limited to, endometrial tumor, and neural and developmental 
diseases and/ or disorders. Similarly, the protein of SEQ ID NO:316 or fragments thereof and antibodies 

25 directed to these polypeptides are useful in providing immunological probes for differential identification 
of the tissue(s) or cell type(s). For a number of disorders of the above tissues or cells, particularly of the 
neural and reproductive organs, expression of this gene at significantly higher or lower levels may be 
routinely detected in certain tissues or cell types (e.g., neural, reproductive, cancerous and wounded 
tissues) or bodily fluids (e.g. lymph, serum, plasma, urine, amniotic fluid, synovial fluid and spinal fluid) 

30 or another tissue or cell sample taken from an individual having such a disorder, relative to the standard 
gene expression level, i.e., the expression level in healthy tissue or bodily fluid from an individual not 
having the disorder. 

The tissue distribution in infant brain tissue and adult brain tissue, as well as the homology to 
cornichon proteins, indicates that polynucleotides encoding the protein of SEQ ID NO:316 or fragments 
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thereof and polypeptides comprising the protein of SEQ ID NO:316 or fragments thereof are useful for 
detecting and/or treating neural and developmental disorders. The tissue distribution indicates that these 
polynucleotides and polypeptides are useful for the detection/treatment of neurodegenerative disease 
states and behavioural disorders such as Alzheimers Disease, Parkinsons Disease, Huntingtons Disease, 
5 Tourette Syndrome, schizophrenia, mania, dementia, paranoia, obsessive compulsive disorder, panic 
disorder, learning disabilities, ALS, Psychoses, autism, and altered behaviors, including disorders in 
feeding, sleep platterns, balance, and perception. In addition, the gene or gene product may also play a 
role in treatment and/or detection of developmental disorders associated with the developing embyo, or 
sexually-linked disorders, 

10 Elevated expression of the protein of SEQ ID NO:316 within the brain suggests that it may be 

involved in neuronal survival, synapse formation, conductance, neural differentiation, etc. Such 
involvment may impact many processes, such as learing and cognition. Alternatively, the tissue 
distribution in endometiral tumor tissue, germ cell tumors and skin melanomas indicates that the 
translation product of this gene is useful for the detection and/or treatment of endometrial tumors and/or 

15 reproductive disorders, as well as tumors of other tissues where expression of this gene has been 
observed. Furthermore, the protein of SEQ ED NO:316 or fragments thereof may also be used to 
determine biological activity, to raise antibodies, as a tissue marker, to isolate cognate ligands or 
receptors, to identify agents that modulate their interactions, in addition to its use as a nutritional 
supplement. The protein of SEQ ID NO:316 or fragments thereof, as well as, antibodies directed against 

20 the protein may be used as tumor marker and/or immunotherapy targets for the above listed tissues. 

The gene encoding the protein of SEQ ID NO:316 is thought to reside on chromosome 11. 
Accordingly, polynucleotides encoding the protein of SEQ ID NO:316 or fragments thereof are useful as 
a marker in linkage analysis for chromosome 1 1 . 

Accordingly, the present invention includes the use of the protein of SEQ ID NO:316 , fragments 

25 comprising at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150, or 200 consecutive amino 
acids thereof, or fragments having a desired biological activity to treat or ameliorate a condition in an 
individual. For example, the condition may be an abnormality in development, a signaling pathway, 
microtubule construction, neuronal survival, synapse formation, conductance, neuarl differentiation, or it 
may be cancer or an abnormality in any of the functions listed above. In such embodiments, the protein of 

30 SEQ ID NO:316, or a fragment thereof, is administered to an individual in whom it is desired to increase 
or decrease any of the activities of the protein of SEQ ID NO:316. The protein of SEQ ID NO:316 or 
fragment thereof may be administered directly to the individual or, alternatively, a nucleic acid encoding 
the protein of SEQ ID NO:316 or a fragment thereof may be administered to the individual. 
Alternatively, an agent which increases the activity of the protein of SEQ ID NO:316 may be 
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administered to the individual Such agents may be identified by contacting the protein of SEQ ED 
NO:316 or a cell or preparation containing the protein of SEQ ID NO:316 with a test agent and assaying 
whether the test agent increases the activity of the protein. For example, the test agent may be a chemical 
compound or a polypeptide or peptide. 
5 Alternatively, the activity of the protein of SEQ ID NO:316 may be decreased by administering 

an agent which interferes with such activity to an individual. Agents which interfere with the activity of 
the protein of SEQ ID NO:3 1 6 may be identified by contacting the protein of SEQ ID NO:3 16 or a cell or 
preparation containing the protein of SEQ ID NO:316 with a test agent and assaying whether the test 
agent decreases the activity of the protein. For example, the agent may be a chemical compound, a 
10 polypeptide or peptide, an antibody, or a nucleic acid such as an antisense nucleic acid or a triple helix- 
forming nucleic acid. 

In one embodiment, the invention relates to methods and compositions using the protein of the 
invention or part thereof as a marker protein to selectively identify the source of a sample as, for example, 
brain, or to distinguish between two or more possible sources of a sample on the basis of the level of the 

15 protein of SEQ ID NO:316 in the sample. For example, the protein of SEQ ID NO:316 or fragments 
thereof may be used to generate antibodies using any techniques known to those skilled in the art, 
including those described therein. Such antibodies may then be used to identify tissues of unknown 
origin, for example, forensic samples, differentiated tumor tissue that has metastasized to foreign bodily 
sites, or to differentiate different tissue types in a tissue cross-section using immunochemistry. In such 

20 methods a sample is contacted with the antibody, which may be detectably labeled, under conditions 
which facilitate antibody binding. The level of antibody binding to the test sample is measured and 
compared to the level of binding to control cells from brain or tissues other than brain to determine 
whether the test sample is from brain. Alternatively, the level of the protein of SEQ ID NO:316 in a test 
sample may be measured by determining the level of RNA encoding the protein of SEQ ID NO:316 in the 

25 test sample. RNA levels may be measured using nucleic acid arrays or using techniques such as in situ 
hybridization, Northern blots, dot blots or other technques familiar to those skilled in the art. If desired, 
an amplification reaction, such as a PCR reaction, may be performed on the nucleic acid sample prior to 
analysis. The level of RNA in the test sample is compared to RNA levels in control cells from brain or 
tissues other than brain to determine whether the test sample is from brain. 

30 In another embodiment, antibodies to the protein of the invention or part thereof may be used for 

detection, enrichment, or purification of cells expressing the protein of SEQ ED NO:316, including using 
methods known to those skilled in the art. For example, an antibody against the protein of SEQ ID 
NO:316 or a fragment thereof may be fixed to a solid support, such as a chromatograpy matrix. A 
preparation containing cells expressing the protein of SEQ ID NO:316 is placed in contact with the 
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antibody under conditions which facilitate binding to the antibody. The support is washed and then the 
cells are released from the support by contacting the support with agents which cause the cells to 
dissociate from the antibody. 

In another embodiment of the present invention, the protein of SEQ ID NO:316 or a fragment 
5 thereof may be used to diagnose disorders associated with altered expression of the protein of SEQ ID 
NO:316. In such techniques, the level of the protein of SEQ ID NO:316 in an ill individual is measured 
using techniques such as those described herein. The level of the protein of SEQ ID NO:316 in the ill 
individual is compared to the level in normal individuals to determine whether the individual has a level 
of the protein of SEQ ID NO:3 16 which is associated with disease. 

10 Protein of SEQ ID NO:255 Q 06-037- 1-0-E9-CS. cor) 

The protein of SEQ ID NO:255, encoded by the cDNA of SEQ ID NO: 14, is strongly expressed 
in the liver and testis and shows extensive homology to human lactate dehydrogenase-A protein (LDH-A 
or M chain) (Chung F.Z. et al., Biochem. J. 231:537-541(1985); SwissProt accession number P00338). 
The protein of SEQ ID NO:255 is also homologous to lactate dehydrogenase A from many vertebrates. 

15 The 381-amino-acid-long protein of SEQ ED NO:255 displays a Prosite motif corresponding to lactate 
dehydrogenase from positions 71 to 380. In addition, the active site LGEHGDS, where H is the active 
site residue, is present in the protein of the invention (positions 239 to 245). The protein of the invention 
also contains an additional 50 N-terminal amino acids not found in other lactate dehydrogenase A 
proteins. This N-termimal extension contains a signal peptide (cleavage site at position 34 of the protein 

20 of invention) that may allow the export of the protein to the extracellular domain or define a particular 
subcellular localization. Alternatively, the initiation start codon could be at position 26 or 50 of the 
protein of SEQ ID NO:255. 

Lactate dehydrogenase (LDH) is an enzyme which dehydrogenates lactic acid into 
pyruvic acid in conjunction with the hydrogen acceptor NAD+, and which exists in a wide 

25 variety of animal tissues and microorganisms as an enzyme serving to produce lactic acid from 
pyruvic acid in the glycolytic pathway (Abad-Zapatero C. et al. J. Mol. Biol 198:445- 
467(1987)). It is known that in vertebrates there are three isozymes of LDH: the M form (LDH- 
A), found predominantly in muscle tissues; the H form (LDH-B), found in heart muscle, and the 
X form (LDH-C), found only in the spermatozoa of mammals and birds. In birds and 

30 crocodilian eye lenses, LDH-B serves as a structural protein and is known as epsilon-crystallin 
(Hendriks W. et al. Proc. Natl. Acad. Sci. U.S.A. 85:7114-7118(1988)). 
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LDH has been used extensively in the field of clinical test reagents for a number of 
purposes. For example, it has been used as a coupling enzyme to determine the enzymatic 
activity of various amino-transferases, such as alanine aminotransferase (ALT), which is 
ultimately detected by UV spectrometry of the produced pyruvic acid. This use of LDH has 
5 been widely adopted as a clinical test, because amino-transferases are enzymes which show high 
activity in liver, heart, kidney, etc. and show remarkable increases in serum in association with 
various diseases. LDH has also been used as a coupling enzyme to help determine the level of 
substrates such as urea, as the enzyme promotes the conversion of such substances into pyruvic 
acid which can be detected by UV spectrometry. 

10 Lactate dehydrogenase is also a widely used marker for heart disease and other 

conditions. For example, levels of LD-1 are elevated in the presence of myocardial infarction 
and in other conditions such as leukemia. Levels of lactate dehydrogenase start to increase 24 to 
48 hours after occlusion of the coronary artery, peak in 3 to 6 days, and return to normal in 8 to 
14 days. In addition, levels of LD-1 are elevated 10 to 12 hours after the acute myocardial 

15 infarction, peak in 2 to 3 days, and return to normal in approximately 7 to 10 days. Thus, 
measurement of the level of lactate dehydrogenase allows a prolonged retrospective diagnosis of 
myocardial infarction. Further, while the amount of LD-2 in the blood is usually higher than the 
amount of LD-1, patients with acute myocardial infarction have more LD-1 than LD-2. This 
"flipped ratio" usually returns to normal in 7 to 10 days. An elevated level of LD-1 with a 

20 flipped ratio has a sensitivity and specificity of approximately 75% to 90% for detection of acute 
myocardial infarction. 

Elevated LDH levels have also been used as a prognostic indicator for cancers such as 
small cell lung carcinoma. Specifically, elevated levels of LDH indicate a poor prognosis for 
such diseases (Kawahara, et aL, (1997) Jpn J Clin Oncol. 1997 Jun;27(3): 158-65). 
25 LDH expression in cells has also been shown to be induced by interleukin-1 alpha, a 

major cytokine associated with, e.g., inflammation (Nehar et al. (1998) Biol Reprod 
Dec;59(6): 1425-32). 

Islet beta-cells express low levels of lactate dehydrogenase and have high glycerol 
phosphate dehydrogenase activity. The effects on glucose metabolism and insulin secretion of 
30 acute overexpression of the skeletal muscle isoform of lactate dehydrogenase (LDH)-A in these 
cells have been studied by Ainscow EK et al. (Diabetes 2000 Jul;49(7):1149). The results of 
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these studies have shown that overexpression of LDH activity interferes with normal glucose 
metabolism and insulin secretion in islet beta cells, and it may therefore be directly responsible 
for insulin secretory defects in some forms of type 2 diabetes. These results also reinforce the 
view that glucose-derived pyruvate metabolism in the mitochondria is critical for glucose- 
5 stimulated insulin secretion in beta cells. Other data show that an overexpression of lactate 
dehydrogenase A attenuates glucose-induced insulin secretion in stable MIN-6 beta-cell lines, 
which normally express low levels of L-lactate dehydrogenase (Zhao C, Rutter GA FEBS Lett. 
1998 Jul 3;430(3):213-6). Low LDH activity thus appears to be important in beta-cell glucose 
sensing. 

10 Analysis of the LDH isoenzyme pattern in CSF fluid has also been shown to be helpful in the 

evaluation of CNS involvement in patients with hematologic malignancies (Lossos IS, et al. Cancer. 2000 
Apr 1;88(7): 1599-604). 

It is believed that the protein of SEQ ID NO:255 is a lactate dehydrogenase protein, most likely 
of the LDH-A or M subtype. The activity of the present protein can be assessed using any standard 

15 method for detecting lactate dehydrogenase enzyme activity, including those involving the UV detection 
of pyruvate, a product of LDH-catalyzed enzymatic reactions. 

In one embodiment, the polypeptides and polynucleotides of the invention are used to detect testis 
and liver tissue, as well as cells derived from these tissues. For example, nucleic acids and proteins of the 
invention can be labeled isotopically or chemically, using methods known to those skilled in the art, and 

20 used as probes in northern blots, far-western blots and in situ hybridization experiments. An ability to 
detect specific cell types is useful, e.g. for the determination of the history of tumor cells, as well as for 
the identification of cells and tissues for histological studies. 

In another embodiment, the present protein can be used in any of a variety of clinical assays 
involving LDH enzymes. For example, the protein can be used as a coupling enzyme to determine the 

25 enzymatic activity of various amino-transferases, such as alanine aminotransferase (ALT), as detected by 
UV spectrometry of the produced pyruvic acid. Such assays have significant clinical utility, as amino- 
transferases are enzymes which show high activity in liver, heart, kidney, etc. and show remarkable 
increases in serum in association with various diseases. The protein of the invention can also be used as 
a coupling enzyme to help determine the level of substrates such as urea, as the enzyme promotes the 

30 conversion of such substances into pyruvic acid which can be detected by UV spectrometry. 

In another embodiment, the present protein can be used to identify ingredients for cosmetic 
formulations. Specifically, enhancers of lactate dehydrogenase can be included in cosmetic compositions 
to stimulate keratinocyte proliferation and collagen synthesis in cutaneous tissues. The inhibitors can be 
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combined with other active ingredients such as pyruvic acid, acetic acid, acetoacetic acid, beta- 
hydroxybutyric acid, Krebs cycle pathway metabolites, aliphatic saturated or unsaturated fatty acids 
containing from 8 to 26 carbon atoms, omega-hydroxy acids containing from 22 to 34 carbon atoms, 
glutamic acid, glutamine, valine, alanine, leucine, and mixtures thereof (see, e.g., US Patent 5,853,742, 
5 the disclosure of which is hereby incorporated by reference in its entirety). 

In another embodiment, the present invention provides methods for treating or preventing cancer, 
e.g., by inhibiting lactate dehydrogenase activity in cells, preferably specifically the cancer cells, of a 
patient. The expression or activity of lactate dehydrogenase can be inhibited using any of a large number 
of agents, including, but not limited to, antibodies, antisense molecules, ribozymes, and heterologous 

10 molecules that inhibit the expression or activity of the lactate dehydrogenase in the cancer cells of the 
patient. In one embodiment, lactate dehydrogenase that has been obtained from a primate, or anti-lactate 
dehydrogenase antibodies obtained from a mammal as a result of the parenteral administration of primate 
lactate dehydrogenase to said mammal, is parenterally administered to human cancer patients. Antibodies 
derived from the protein of the invention or part thereof can also be used to inhibit cancer cell 

15 development as described in US Patent No. 4,620,972. 

Analysis of the LDH isoenzyme pattern in CSF fluid has been shown to be helpful in the 
evaluation of CNS involvement in patients with hematologic malignancies (Lossos IS, et al. Cancer. 2000 
Apr 1;88(7): 1599-604). Thus, in another embodiment, the protein of SEQ ID NO:255 can be used to 
develop assays to monitor the LDH isoenzyme activity in CSF fluid, thereby improving the sensitivity of 

20 CSF cytology. This assay may be derived, e.g., from the methods described by Short S. et al. (J Biol 
Chem. 2000 Apr 28;275(17): 12963-9). 

In another embodiment, the protein of SEQ ID NO:255 is used to detect and/or treat insulin 
secretory defects in some forms of type 2 diabetes. For example, various evidence indicates that LDH 
overexpression may be involved in certain types of diabetes. Therefore, the detection of an elevated level 

25 of LDH in a patient, e.g. in pancreatic islet cells of a patient, can be used as an indication that the patient 
has diabetes, or is at risk of developing diabetes. Similarly, methods of inhibiting the expression or 
activity of LDH in those cells, e.g. using antibodies, antisense sequences, or heterologous compounds that 
inhibit the expression or activity of LDH, can be used to treat or prevent diabetes. 

In another embodiment, the protein of the invention can be used to eliminate endogenous pyruvic 

30 acid in cells in vitro or in vivo. 

In another embodiment, the expression of the present protein is used as a marker for interleukin 1, 
e.g. IL-1 alpha, activity in cells or in a patient. Specifically, as it has been shown that LDH expression is 
induced by IL-1 alpha, then the expression, or elevated expression, of the present protein can be used as a 
marker for the action of IL-1 on the cell. As IL-1 has been implicated in a number of physiological 
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processes, including inflammation and more specifically in deleterious processes such as arthritis and 
autoimmune disorders, the present protein can serve as a marker for the presence of such disorders, or for 
a predisposition for the disorders. 

In another embodiment, the present protein is used to detect heart disease and other 
5 diseases in patients. For example, levels of LDH are known to rise following myocardial 
infarction and other heart ailments. Accordingly, the detection of an elevated level of the protein 
of the invention, alone or in view of the levels of other proteins such as other LDH isozymes, can 
be used as an indicator of a heart attack or other diseases, including leukemia. The levels of 
LDH can be assessed in any tissue or biological sample, including, but not limited to, serum, and 

10 can be detecting using any standard method, including, but not limited to, immunoassays and 
assays for LDH enzyme activity. 

In another embodiment, the present protein is used to determine a prognosis for any of a 
number of diseases, including cancers such as small cell lung carcinoma. For example, the level 
of the present protein is detected in the serum of a patient suffering from cancer, wherein the 

15 detection of a decreased level of expression or activity of the protein indicates a worse prognosis 
for the patient compared to the prognosis in a patient with a normal level of the protein activity 
or expression. 

Proteins of SEP ID NOs:243, 253 (internal designation numbers 105-016-1-0-D3-CS and 105-095-2-0- 
Gll-CS) 

20 The 33 1 -amino-acid- long protein of SEQ ID NO:243, encoded by the cDNA of SEQ ID NO:2, is 

found in prostate and in fetal brain and is homologous to a secreted human protein (Genseq accession 
number Y59685). In addition, this protein is highly homologous to the the putative 
glycerophosphodiester phosphodiesterase (GP-PDE) MIR 16 (Membrane Interacting protein of RGS16) 
protein (SPTREMBLNEW SPTREMBL SWISSPROT accession number AAF65234) encoded by the 

25 cDNA of GENPEPT GENPEPTNEW accession number AF2 1 2862; in fact, the protein of the invention is 
a likely variant of the MIR 16 protein. Furthermore, a BLAST search with the amino acid sequence of 
SEQ ID NO:243 indicates that the protein of the invention is homologous to GP-PDEs of E.coli 
(SWISSPROT accession numbers P09394 and P10908) and Haemophilus influenzae (SWISSPROT 
accession number Q06282). The protein of SEQ ID NO:243 displays 2 candidate membrane-spanning 

30 segments, from amino acids 7 to 27 and 258 to 278, and a putative signal peptide from amino acids 19 to 
24. Finally, the protein of the invention has two putative N-glycosylation sites: asparagine residues at 
positions 168 and 198 (Zheng et al. 9 Proc. Natl. Acad. Sci. 97 :3999-4004 (2000)). 
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The cDNA of SEQ ID NO:2 differs from the cDNA of GENPEPT GENPEPTNEW accession 
number AF212862 by its extended 5' and 3' termini, and from the cDNA of SEQ ID NO: 12 by 
polymorphisms and alternate splicings. 

The MIR16 (Membrane Interacting protein of RGS16) protein, which is homologous to the 
5 protein of the invention, was identified in a yeast two-hybrid screen of a pituitary cell cDNA library using 
the RGS16 (Regulator of G protein Signaling) protein as bait (Zheng et al, Proc. Natl. Acad. Sci. 
97:3999-4004 (1999)). and Sasaki, J. Bacteriol. 175:4569-4571 (1993); Zheng et al, ibid.). Remarkably, 
the GP-PDE from Haemophilus influenzae (also called protein D) which is 67% identical to the 
periplasmic GP-PDE of E.coli, presents affinity for human immunoglobin D (Janson et al., Infect. 

10 Immun. 62:4848-854 (1994)). 

From sequence alignments, it can be seen that the N-terminal region of MIR16 (amino acids 70- 
150), immediately after the putative signal peptide, is highly conserved (40-61% similarity), suggesting 
that it may contain residues critical for catalytic activity, i.e., the catalytic site. GP-PDEs hydrolyze 
deacetylated phospholipid GPs, such as glycerophosphocholine (GPC) and glycerophosphoethanolamine, 

15 to sn-glycerol-3 -phosphate (G3P) and the corresponding alcohols (Zheng et al., ibid.). The putative 
enzymatic activity of MIR 16 and its interaction with RGS16 suggest that it may play important roles in 
lipid metabolism and in G protein signaling. As shown in northern blot experiments, the MIR16 mRNA 
is highly transcribed in heart, liver, kidney, testis and brain. The observed expression of MIR16 in the 
brain is consistent with the above -described expression of the protein of the invention in the fetal brain. 

20 It is believed that the proteins of SEQ ID NOs:243 and 253 or part thereof are members of the 

glycerophosphodiester phosphodiesterase protein family, interact with the RGS16 protein and, as such, 
play important roles in both lipid metabolism and in G protein signaling. Preferred polypeptides of the 
invention are polypeptides comprising the amino acids of SEQ ID NO:243 from positions 7 to 27, 19 to 
24 and 258 to 278. Other preferred polypeptides of the invention are fragments of SEQ ID NO:243 or 

25 253 having any of the biological activities described herein. Additional preferred polypeptides are those 
that comprise asparagine residues at positions 168 and/or 198. 

The invention first relates to methods and compositions using cDNAs of SEQ ID NO:2 or 12 or 
part thereof, and proteins of the invention SEQ ID NO:243 or 253 or part thereof to identify specific cell 
types, preferably from prostate or fetal brain. For example, nucleic acids and proteins of the invention are 

30 labeled isotopically or chemically following methods known to those skilled in the art, and further used as 
probes in northern blots, far-western blots and in situ hybridization detection experiments. An ability to 
detect specific cell types is useful, e.g. for the determination of the history of tumor cells, as well as for 
the identification of cells and tissues for histological studies. 
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Any of a number of in vitro assays can be used to detect SEQ ID NO:243 or 253 protein activity, 
for example for in vitro screening of modulators of protein activity. Preferably cDNA encoding the 
protein of the invention is cloned in a prokaryotic expression vector, according to methods known to 
those skilled in the art. Briefly, the GP-PDE activity of the recombinant protein is analyzed by a coupled 
5 spectrophotometry assay as described by Larson and collaborators and adapted by Cameron and 
collaborators (Larson et al., J. Biol. Chem. 258 :5426-5432 (1983); Cameron et al., Infect. Immun. 
66 :5763-5770 (1998)). Such enzymatic activity may be measured in vitro in the presence of modulating 
drugs. 

Another embodiment of the present invention relates to methods of using the protein of the 

10 invention or part thereof to purify or specifically bind to human immunoglobin D. Several immunoglobin 
(Ig) binding bacterial cell wall proteins have been isolated and/or cloned during the last two decades. The 
best characterized of these are protein A of Staphylococcus aureus (which binds to human IgG subclasses 
1, 2 and 4, IgG of several mammalians species, and in some instances human Ig of classes A, M, E), and 
protein G of group G beta-hemolytic streptococci (which binds to all human IgG subclasses and which 

15 also displays a wider binding spectrum for animal IgG than protein A). IgD binds to neither protein A 
nor protein G. Consequently, it is of great interest to identify new proteins capable of binding IgD, 
thereby allowing its separation and purification. In addition, IgD binding proteins can also be used in 
immunoprecipitation procedures with IgD, as are routinely performed with proteins A and G in the case 
of IgG. The binding and purification of IgD using the protein of the invention can be accomplished in 

20 any of a number of ways, for example by generating a fusion protein or polypeptide in which the protein 
of the invention or part thereof, is combined with another protein by the use of a recombinant DNA 
molecule. The resulting fusion product including the protein of the invention or part thereof is then 
covalently, or by any other means, bound to a protein, carbohydrate or matrix (such as gold, "Sephadex" 
particles, polymeric surfaces). Such a complex is very useful for IgDs immobilization and consecutive 

25 immunoprecipitations in batch. Similar assays for binding of protein D (GP-PDE) of Haemophilus 
influenzae and IgD are described in the US Patent No. 6,025,484. 

Another embodiment of the invention relates to compositions and methods using the protein of 
the invention, or part thereof, as GP-PDE enzymes to hydrolyze deacylated phospholipids (GPs), such as 
glycerophosphocholine (GPC) and glycerophosphoethanolamine, to sn-glycerol-3-phosphate (G3P) and 

30 the corresponding alcohols. First, this enzymatic activity, which belongs to the class of specific 
phospholipase D, makes the protein of the invention very useful to study biological membranes and their 
phospholipidic components. Moreover, as glycerophospholipids are major components of the lipidic 
bilayer, elimination of their hydrophilic moiety using the GP-PDE activity of the protein of the invention 
would likely modify the structure and consequently the permeability of eukaryotic cell membranes. Such 
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modifications could improve the transfection efficiency of eukaryotic cells, in vitro or in vivo. Typically, 
in such embodiments the purified protein of SEQ ID NOs:243 or 253 is administrated to cells; purified 
proteins of the invention can be obtained in any of a number ways, for example by inserting the cDNA 
encoding the proteins into a prokaryotic expression vector using any technique known to those skilled in 
5 the art. The recombinant protein produced and purified in the prokaryotic system is then added to an in 
vitro culture of eukaryote cells before or during transfection. The recombinant protein of the invention 
can also be used to increase the efficiency of cell transfection in vivo, most notably in the case of gene 
therapy. For example, tumoral masses are very often resistant to transfection, and the protein of the 
invention would likely provide an effective way to facilitate the introduction of cytotoxic genes (such as 

10 pro-apoptotic genes) or antitumoral drugs in solid tumors. 

Still another embodiment of the protein of the invention relates to methods and compositions to 
diagnose, treat, and prevent disorders associated with excess glutamate signaling in the brain. As 
described above, the MIR16 protein interacts physically with the RGS16 protein (Regulator of G protein 
Signaling 16). Receptors of many hormones use heterotrimeric G proteins for signal transduction after 

15 ligand binding (for a review, seeNeer, Cell 80:249-257 (1995)). Among these receptors are 
metabotropic glutamate receptors (mGluRs). These receptors, which are expressed in the brain, like the 
protein of the invention, are a novel family of cloned G-protein-coupled receptors (Schoepp and Conn, 
Trends Pharmacol. Sci. 14:13-20 (1993)). Endogenous glutamate, by activating the mGluRl receptor 
(and also NMDA and AMPA receptors), may contribute to the brain damage occurring acutely after 

20 epilepsy, cerebral ischemia or traumatic brain injury. It may also contribute to chronic neurodegeneration 
in such disorders as amyotrophic lateral sclerosis and Huntington's chorea (Meldrum, J. Nutr. 130(4S 
Suppl):1007S-1015S(2000)). 

The invention thus relates to methods and compositions using cDNAs of SEQ ID NO:2 or 12 or 
part thereof, and proteins of SEQ ID NO.243 or 253 or part thereof, to diagnose, treat, or prevent 

25 disorders associated with excess glutamate signaling in the brain. Specifically, the level of activity or 
expression of the proteins can be correlated with the level of glutamate signaling, or with the glutamate- 
signaling associated brain damage involved in epilepsy, cerebral ischemia, traumatic brain damage, ALS, 
or Huntington's chorea, or with any other G-protein associated physiological process or disease or 
condition. For situations where the level of the expression or activity of the protein is positively 

30 correlated with such signaling or with the presence of a disease or condition, the signaling, disease or 
condition can be detected using any of a number of tools for detecting protein expression or activity, 
including northern blots, far-western blots and in situ hybridization experiments, where an elevated level 
of the protein, protein activity, or nucleic acid of the invention indicates the presence of the disease, 
condition, or signaling process. Further, such diseases or conditions can be treated or prevented, or such 
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signaling pathways can be inhibited, using compounds that inhibit the expression or activity of the 
protein, such as antibodies, antisense molecules, ribozymes, dominant negative forms of the protein, or 
any heterologous molecule that inhibits protein activity or expression. Alternatively, where the 
expression or activity of the protein of the invention is negatively associated with the signaling pathway, 
5 disease or condition, a detection of a decreased level of expression or activity of the protein can be used to 
indicate the presence of the disease, condition, or pathway. Further, in such cases, the disease or 
condition can be treated or prevented, or the pathway be inhibited, using any compound that increases the 
activity or level of the protein, such as nucleic acids encoding the protein, the protein itself, or 
heterologous compounds that cause an increase in the level of protein expression or activity. 

10 Protein of SEQ ID NO:386 (internal designation 105-037-4-O-H12-CS) 

The protein of SEQ ID NO:386, encoded by the cDNA of SEQ ID NO: 145, is strongly expressed 
in the fetal brain and uterus. The 207-amino-acid-long protein of SEQ ID NO:386 displays pfam SPRY 
domains from positions 85 to 205. 

SPRY domains have been found in a number of proteins involved in multiple cellular and 

15 developmental processes. For example, the Midline- 1/FXY family of proteins has been shown to 
associate with microtubules, and has been implicated in human diseases, such as Opitz Syndrome, a 
congenital disorder characterized by multiple developmental abnormalities (see, e.g., Cainarca, et al., 
(1999) Hum Mol Genet 8(8): 1387-96). In addition, the cytoplasmic Marenostrin/Pyrin protein has been 
demonstrated to be the cause of Familial Mediterranean fever, an autosomal recessive disorder 

20 characterized by fever and serositis (Nat Genet 1997 Sep;17(l):25-31). Other SPRY proteins include 
SplA, a serine protease from Staphylococcus aureus, and butyrophilin, a major milk protein. Another 
family of proteins known to contain the SPRY domain are the Ryanodine receptors (RyRs). 

Ryanodine receptors play an important role in Ca2+ signaling in muscle and non muscle cells by 
releasing Ca2+ from intracellular stores. For example, these receptors are centrally important in 

25 excitation-contraction (e-c) coupling, which occurs at specialized regions where the sarcoplasmic 
reticulum (SR), containing the ryanodin receptors, and the plasma membrane/transverse-tubule system 
form junctions. RyRs are also thought to play some role in maintaining the structural integrity of the 
SRT-tubule junctions. RyR is apparently unable to carry out the requisite functions associated with e-c 
coupling by itself, however, because it forms interactions with other macromolecules at the triad junction. 

30 For example, two small proteins, calmodulin and FKBP12, are believed to modulate RyR at the triad 
junction. 

It is believed that mammalian tissues express three different RyR isoforms, comprising four 560- 
kDa (RyR polypeptide) and four 12-kDa (FK506 binding protein) subunits. It is believed that these large 
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protein complexes conduct monovalent and divalent cations and are capable of multiple interactions with 
other molecules. The subunits of the protein complexes include small diffusible endogenous effector 
molecules including Ca2+, Mg2+, adenine nucleotides, sufhydryl modifying reagents (glutathione, NO, 
and NO adducts) and lipid intermediates, and proteins such as protein kinases and phosphatases, 
5 calmodulin, immunophilins (FK506 binding proteins), and in skeletal muscle the dihydropyridine 
receptor. The RyR from skeletal muscle is the major calcium release channel for that tissue, and the most 
intensively studied of the three genetic isoforms detected thus far in mammalian species. The other two 
RyR isoforms are often referred to as the 'heart' and "brain' forms, but the actual cell and tissue 
distribution of the isoforms is complex. 

10 Because of their multiple ligand interactions, ryanodin receptors constitute an important, 

potentially rich pharmacological target for controlling cellular functions. Ca2+ release channel activity is 
modulated by many endogenous effectors, including Ca2+, ATP, Mg2+, and calmodulin. In addition, 
many exogenous effectors, including caffeine, local anesthesics, and polyamines, also modify channel 
activity. For example, tetracaine, procaine, benzocaine, and lidocaine inhibit Ca2+ release from the SR. 

15 They appear to interact with a specific site(s) located on the RYR, affecting both ryanodin-binding and 
single channel activities (Shoshan-Barmatz et al. 1993; J. Membr. Biol.; 133; 171-181). 

The importance of intracellular calcium as a second messenger in cellular signal transduction 
processes is well established. Alterations in intracellular Ca2+ homeostasis have profound effects on 
many cell functions, including secretion, contraction-relaxation, motility, metabolism, protein synthesis, 

20 modification and folding, gene expression, cell-cycle progression and apoptosis. A major source of 
cytoplasmic calcium is from intracellular storehouses located in the endoplasmic reticulum, or in muscle, 
within the sarcoplasmic reticulum (SR). 

Given that cellular Ca2+ handling is an important factor in the control of neuronal metabolism 
and electrical activity, abnormalities of intracellular Ca2+ channels might be expected to contribute to 

25 some forms of epilepsy or to anoxic brain damage following an episode of cerebral ischemia. Cell loss is 
said to be a characteristic feature of degenerative brain disorders, including Alzheimer's disease. It is 
well established that neuronal cell death may be secondary to an abnormal elevation of cytoplasmic Ca2+, 
particulary that associated with activation of excitatory glutamate receptors (e.g., in epilepsy). This 
strongly suggests that the release of stored Ca2+ contributes to nerve cell damage and cell death in 

30 various circumstances. 

It is believed that the protein of SEQ ID NO: 3 86 is functionally related to other SPRY-containing 
proteins, such as the ryanodine receptors, Marenostrin/Pyrin, SplA, Midline-l/FXY, and butyrophilin. 
Accordingly, it is thus believed that the present protein is associated with the release of Ca2+ from 
intracellular Ca2+-storing organelles, like the endoplasmic reticulum and, in muscle, the sarcoplasmic 



367 



G-078US05DIV 



reticulum (SR), as well as being involved in microtubule binding. Preferred polypeptides of the invention 
are any fragments of SEQ ID NO:386 having any of the biological activities described herein. 

In one embodiment, the present protein and nucleic acids can be used to specifically detect cells 
of the fetal brain and uterus, as the protein is overexpressed in these tissues. For example, the protein of 
5 the invention or part thereof may be used to synthesize specific antibodies using any technique known to 
those skilled in the art. Such tissue-specific antibodies may then be used to identify tissues of unknown 
origin, such as in forensic samples, differentiated tumor tissue that has metastasized to foreign bodily 
sites, etc., or to differentiate different tissue types in a tissue cross-section using immunochemistry. The 
protein can also be used to specifically label microtubules in cells. 

10 In another embodiment, the protein of the invention or part thereof may be used in regulating 

intracellular Ca2+ levels. As alterations in intracellular Ca2+ homeostasis have profound effects on many 
cell functions, including secretion, contraction-relaxation, motility, metabolism, protein synthesis, 
modification and folding, gene expression, cell-cycle progression and apoptosis, the ability to modulate 
intracellular Ca2+ levels provides a tool to alter any of these cellular functions, in vitro or in vivo. Such 

15 an ability has wide utility for a large number of applications, for example to manipulate the behavior (e.g. 
growth rate, secretion, survival, etc.) of cells grown in vitro, as well as to treat, prevent, or diagnose any 
of a number of diseases associated with altered Ca2+ signaling in vivo. The activity or expression of the 
protein of the invention can be modulated in any of a large number of ways, for example by administering 
to cells or to a patient the protein itself, a polynucleotide encoding the protein, antibodies, antisense 

20 sequences, dominant negative forms of the protein, compounds that alter the expression or activity of the 
protein, etc. The effect of any such agent on calcium flux in cells can be detected using standard 
methods, including by studying the permeation of Ca2+ release through endoplasmic reticulum (ER) and 
sarcoplasmic reticulum (SR) channels using tracers, light scattering and fluorescence quenching, and 
channel reconstitution in planar bilayer. In addition, targeted recombinant photoproteins can provide 

25 direct measurements of organellar Ca2+ (Montero et al.; 1995; EMBO J.; 14, 5467-5475). 

The invention further relates to methods and compositions using the protein of the invention or 
part thereof to diagnose, prevent and/or treat several disorders in which the activity or recognition of 
ryanodin receptors, is impaired or excessive. These disorders include, but are not limited to, 
neurodegenerative diseases, cardiovascular disorders, severe myasthenia, malignant hyperthermia, 
- 30 epilepsy, and central core disease. For example, in patients with severe myasthenia, the level of anti-RyR 
antibodies has been directly related to the severity of the disease (Skeie et al., 1996: Eur. J. Neurol. 3; 
136-140). There is also some evidence to suggest that RyR abnormalities are a primary cause of many 
types of cardiac disease. In addition, the protein of the invention can be used to diagnose other diseases 
associated with SPRY-protein dysfunction, such as Familial Mediterranean fever and Opitz syndrome. 
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Finally, as SPRY containing proteins have been implicated in embryonic development (e.g. the Midlinel 
protein), the protein and nucleic acids of the invention can be used to detect developmental disorders, as 
the detection of a mutation in the gene encoding SEQ ID NO:386, or a detection of abnormal gene 
expression in a fetus, can be used to indicate the presence of a developmental abnormality. For example, 
5 as the protein of SEQ ID NO: 3 86 is strongly expressed in the fetal brain, it is likely that the protein plays 
a role in the normal development of the brain in utero. 

The present invention also relates to diagnostic assays for detecting altered levels of the protein of 
SEQ ID NO:386 in various tissues, as over-expression of the protein compared to normal control tissue 
samples can indicate the presence of certain disease conditions such as neurodegenerative disorders, 
10 cardiovascular disorders, svere myasthenia, malignant hyperthermia, epilepsy, and central core disease. 
Assays used to detect levels of the polypeptide of the present invention in a sample derived from a host 
are well-known to those of skill in the art and include radioimmunoassays competitive-binding assays, 
Western Blot analysis and ELIS A assays. 

Proteins SEP ID NOs:283 and 286 (internal designations 174-38-1-0B6-CS LA and 174-41-1 -0-A6-CS 
15 LA] 

The protein of SEQ ID NO:283, encoded by the cDNA of SEQ ID NO:42, is overexpressed in 
salivary glands and to a lesser extent in bone marrow, and shows homology over the C-terminal length to 
the immunoglobin (Ig) protein superfamily, which is conserved among eukaryotes (including rabbit, 
rodents and human). In particular, the 468-amino-acid-long protein of the invention, which is similar in 

20 size to the constant chain of Ig related proteins, displays two pfam conserved immunoglobulin domains, 
from position 205 to 285 and from position 318 to 384, which are known to be involved in the basic 
structure of the light and heavy constant chains of immunoglobins. It is known (Orr H.T., Nature 
282:266-270(1979)) that the Ig constant chain domains and a single extracellular domain in each type of 
MHC chain are closely related, sharing over one hundred amino-acids of homology. All members of the 

25 Ig related superfamily, including the MHC class I alpha chain and beta-2-microglobulin, as well as the 
MHC class II alpha and beta chains, display the prosite conserved characteristic pattern around the C- 
terminal cysteine ([FY]-x-C-x-[VA]-x-H). This cysteine is involved in the disulfide bond between the 
light and heavy chains, and is also found in the protein of the invention (position 380 to 386). The protein 
of the invention also exhibits an emotif Ig and Major Histocompatibility Complex protein signature from 

30 positions 3 19 to 336. In addition, the protein of the invention displays homology with tapasin (GeneBank 
No. AF009510), a chaperone-like protein closely associated with TAP-binding proteins, which is well 
conserved among eukaryotes (chicken, rodents and human). Tapasin has been shown to increase the 
efficiency of antigen processing and presentation by mediating the association of MHC complex proteins 
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with TAP proteins to the endoplasmic reticulum and to the cell surface during immune response (for 
review see Abele, R. and Tampe, R., Bioch. et Biophysica Acta, 1999). In addition, the protein of the 
invention displays two transmembrane domains from positions 199 to 219 and from positions 406 to 426 , 
a hydrophobic profile similar in amino acid position to the hydrophobic stretch of amino acids of human 
5 and mouse tapasin (Suling L., J. Biol. Chem., 274:8649-8654, 1999), and a secreted signal peptide from 
position 9 to 23. Both signatures are largely present in Ig related proteins such as secreted antibodies or 
antigen presenting proteins. The invention also encompasses a variant (SEQ ED NO:286) of SEQ ID 
NO:283, encoded by the cDNA of SEQ ID NO:45. The protein of SEQ ID No:286 is a 442-amino-acid- 
long protein with a C-terminal shorter end of 26 amino-acids compared to the protein of SEQ ID NO:283. 

10 The variant of SEQ ID NO:286, which results from a frameshift (position 1445 in SEQ ID NO:45) in the 
coding sequence that leads to a stop codon in the corresponding protein, displays characteristics identical 
to those described above in terms of motifs, Ig signatures, function, and potential uses. 

The immunoglobulin (Ig) gene superfamily comprises a large number of cell surface 
glycoproteins that share sequence homology with the V and C domains of antibody heavy and light 

15 chains. These molecules function as receptors for antigens, immunoglobulins and cytokines as well as 
adhesion molecules, and play important roles in regulating the complex cell interactions that occur within 
the immune system (A. F. Williams et al., Annu. Rev. Immuno. 6:381-405, 1988, T. Hunkapiller et al., 
Adv. Immunol. 44:1-63, 1989; for a short review see also Prosite entry PS00290) 

The introduction of an antigen into a host initiates a series of events culminating in an immune 

20 response. In addition, self-antigens can result in immunological tolerance or activation of an immune 
response against self-antigens. A major portion of the immune response is regulated by presentation of 
antigen by major histocompatibility complex molecules. MHC molecules bind to peptide fragments 
derived from antigens to form complexes that are recognized by T cell receptors on the surface of T cells, 
giving rise to the phenomenon of MHC-restricted T cell recognition. The ability of a host to react to a 

25 given antigen (responsiveness) is influenced by the spectrum of MHC molecules expressed by the host. 
Responsiveness correlates with the ability of specific peptide fragments to bind to particular MHC 
molecules. 

There are two types of MHC molecules, class I and class II, each of which comrise two chains. In 
class I [2], the alpha chain is composed of three extracellular domains, a transmembrane region, and a 
30 cytoplasmic tail. The beta chain (beta-2 -microglobulin) is composed of a single extracellular domain. 
In class II [3], both the alpha and the beta chains are composed of two extracellular domains, a 
transmembrane region and a cytoplasmic tail. MHC class I molecules are expressed on the surface of all 
cells, and MHC class II molecules are expressed on the surface of antigen presenting cells. MHC class II 
molecules bind to peptides derived from proteins made outside of an antigen presenting cell. In contrast, 
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MHC class I molecules bind to peptides derived from proteins made inside a cell. In order to present 
peptide in the context of a class II molecule, an antigen presenting cell phagocytoses an antigen into an 
intracellular vesicle, in which the antigen is cleaved, bound to an MHC class II molecule, and then 
returned to the surface of the antigen presenting cell. 
5 Major histocompatibility complex (MHC) class I molecules present antigenic peptides to CD8 T 

cells (Townsend, A. et al., Nature:340,443-448)). The peptides are generated in the cytosol and then 
translocated across the membrane of the endoplasmic reticulum by the transporter associated with antigen 
processing (TAP). TAP is a trimeric complex consisting of TAP 1, TAP2, and tapasin (TAP-A). TAP1 
and TAP2 are required for the peptide transport. Tapasin mediates the interaction of MHC class I HC- 

10 beta-2 microglobulin with TAP, and this interaction is essential for peptide loading onto MHC class I HC- 
beta-2-microglobulin (Suling et al, J. Biol. Chem., 274:8649-8654). T cell receptors (TCRs) are the 
second antigen recognition molecules, and recognize antigens that are bound by MHC molecules. 
Recognition of MHC complexed with peptide (MHC-peptide complex) by TCR can effect the activity of 
the T cell bearing the TCR. Thus, MHC-peptide complexes are important in the regulation of T cell 

1 5 activity and, thus, in regulating an immune response. 

Human cytomegalovirus (HCMV) is a betaherpesvirus which causes clinically serious disease in 
immunocompromised and immunosuppressed adults, as well as in some infants infected in utero or 
perinatally (Alford, C A., and W. J. Britt. 1990. Cytomegalovirus, p. 1981-2010. In D. M. Knipe and B. 
N. Fields (ed.), Virology, 2nd ed. Raven press, New York). In human cytomegalovirus (HCMV)-infected 

20 cells, expression of the cellular major histocompatibility complex (MHC) class I heavy chains is down- 
regulated, where down-regulation is defined as reduction in either synthesis, stability or surface 
expression of MHC class I heavy chains. A similar phenomenon has been reported for some other DNA 
viruses, including adenovirus, murine cytomegalovirus, and herpes simplex virus (Anderson, M., et al., 
Cell 43:215-222, 1985; Burgert and Kvist, Cell 41:987-997, 1985; Heise T. M., et al., J. Exp. Med. 

25 187:1037-1046, 1998). In the adenovirus and herpes simplex virus systems, the product of a viral gene 
which is dispensable for replication in vitro is sufficient to cause down-regulation of MHC class I heavy 
chains (Anderson, M., et al., 1985, supra). The gene(s) involved in class I heavy chain down-regulation 
by murine cytomegalovirus have not yet been identified. 

It is believed that the proteins of SEQ ID NOs:283 and 286 are members of the immunoglobulin 

30 superfamily and, as such, play a role in the immune response, cellular proteolysis, cell proliferation and 
differentiation, pathogen recognition, apoptosis, and other processes associated with the Ig superfamily. 
In addition, the proteins of the invention are thought to be tightly linked to the antigen processing and 
presentation system in the context of peptide assembly and translocation of foreign peptides across 
endoplasmic reticulum and cell surface membranes as new chaperonin-like proteins associated with MHC 
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I and TAP proteins. The weak homology (30%) with the TAP protein family is thought to indicate the 
specificity of the interactions of the proteins of the invention with MHC proteins and/or TAP-related 
proteins, as described by Suling et al., supra. 

Preferred polypeptides of the invention are polypeptides comprising the amino acids of SEQ ID 
5 NO:283 from position 9 to 23, 199 to 219, 205 to 285, 318 to 384, 319 to 336, 380 to 386 and from 406 to 
426. Other preferred polypeptides of the invention are fragments of SEQ ID NO:283 having any of the 
biological activities described herein. 

In one embodiment, the invention relates to methods and compositions for using the protein of the 
invention or part thereof as a marker protein to selectively identify tissues, such as salivary glands and 

10 bone marrow tissues, which strongly express the protein of the invention. For example, the protein of the 
invention or part thereof may be used to synthesize specific antibodies using any techniques known to 
those skilled in the art including those described therein. Such tissue-specific antibodies may then be 
used to identify tissues of unknown origin, for example, forensic samples, differentiated tumor tissue that 
has metastasized to foreign bodily sites, or to differentiate different tissue types in a tissue cross-section 

1 5 using immunochemistry . 

In another embodiment, the invention relates to methods for using the protein of the invention to 
visualize proteins and peptides involved in antigen recognition system within cells by virtue of their 
physical interaction with the proteins of the invention. For example, the protein may be used to detect the 
presence and/or the localization of MHC peptides and TAP- like proteins in a cell. The protein of the 

20 invention, and hence any interacting proteins, can be labeled using any of a number of methods, including 
by binding with specific antibodies or by creating a fusion protein comprising the protein of the invention 
as well as a readily detectable moiety, such as an epitope tag, biotin, or green fluorescent protein. 

In another embodiment, polynucleotide or polypeptide sequences of the invention or part thereof 
may be used for the diagnosis of a disorder associated with a loss of regulation of the expression of the 

25 protein of the invention, preferably, but not limited to, deficiencies of the MHC protein system. 
Examples of such disorders include, but are not limited to, acquired immunodeficiency syndrome (AIDS), 
X-linked agammaglobinemia of Bruton, common variable immunodeficiency (CVI), DiGeorge's 
syndrome (thymic hypoplasia), thymic dysplasia, isolated IgA deficiency, severe combined 
immunodeficiency disease (SCID), immunodeficiency with thrombocytopenia and eczema (Wiskott- 

30 Aldrich syndrome), Chediak-Higashi syndrome, chronic granulomatous diseases, hereditary 
angioneurotic edema, immunodeficiency associated with Cushing's disease, Addison's disease, adult 
respiratory distress syndrome, allergies, ankylosing spondylitis, amyloidosis, anemia, asthma, 
atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, cholecystitis, contact 
dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, episodic 
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lymphopenia with lymphocytotoxins, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, 
glomerulonephritis, Goodpasture's syndrome, gout, Graves' disease, Hashimoto's thyroiditis, 
hypereosinophilia, irritable bowel syndrome, multiple sclerosis, myasthenia gravis, myocardial or 
pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, psoriasis, Reiter's 
5 syndrome, rheumatoid arthritis, scleroderma, Sjogren's syndrome, systemic anaphylaxis, systemic lupus 
erythematosus, systemic sclerosis, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner 
syndrome, complications of cancer, hemodialysis, and extracorporeal circulation, leukemias such as 
multiple myeloma, and lymphomas such as Hodgkin's disease; a cell proliferative disorder such as 
arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), 

10 myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary 
thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, 
sarcoma, teratocarcinoma, and, in particular, cancers of the adrenal gland, bladder, bone, bone marrow, 
brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, 
pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; 

15 and an infection, such as infections by viral agents classified as adenovirus, arenavirus, bunya virus, 
calicivirus, coronavirus, filovirus, hepadnavirus, herpesvirus, flavivirus, orthomyxovirus, parvovirus, 
papovavirus, paramyxovirus, picornavirus, poxvirus, reovirus, retrovirus, rhabdovirus, and togavirus; 
infections by bacterial agents classified as pneumococcus, staphylococcus, streptococcus, bacillus, 
corynebacterium, Clostridium, meningococcus, gonococcus, listeria, moraxella, kingella, haemophilus, 

20 legionella, bordetella, gram-negative enterobacterium including shigella, salmonella, and Campylobacter, 
pseudomonas, vibrio, brucella, francisella, yersinia, bartonella, norcardium, actinomyces, mycobacterium, 
spirochaetale, rickettsia, chlamydia, and mycoplasma; infections by fungal agents classified as 
aspergillus, blastomyces, dermatophytes, cryptococcus, coccidioides, malasezzia, histoplasma, and other 
fungal agents causing various mycoses; and infections by parasites classified as Plasmodium or malaria- 

25 causing, parasitic entamoeba, leishmania, trypanosoma, toxoplasma, Pneumocystis carinii, intestinal 
protozoa such as giardia, trichomonas, tissue nematodes such as trichinella, intestinal nematodes such as 
ascaris, lymphatic filarial nematodes, trematodes such as schistosoma, and cestrodes such as tapeworm. 
To assess abnormal expression of the present protein associated with any of these disorders, the level of 
the present polynucleotides or polypeptides can be detected in a biological sample or cell using any 

30 standard method, including Southern or northern analysis, dot blots, other membrane-based technologies, 
PCR technologies, dipstick, pin, ELISA assays, and in microarrays. Any of these methods may be used 
for the diagnosis of disorders characterized by an alteration of expression of SEQ ID NO:283 or 286, such 
as the disorders mentioned above, or in assays to monitor patients being treated with SEQ ED NO:283 or 
286 or agonists, antagonists, or inhibitors of SEQ ID NO:283 or 286. Antibodies useful for diagnostic 
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purposes may be prepared, e.g., in the same manner as that described in U.S. Patent No. 6,135,941. 
Diagnostic assays for SEQ ID NO:283 or 286 include methods which utilize the antibody and a label to 
detect SEQ ID NO: 283 or 286 in human body fluids or in extracts of cells or tissues. The antibodies may 
be used with or without modification, and may be labeled by covalent or non-covalent attachment of a 
5 reporter molecule. A wide variety of reporter molecules, several of which are described above, are 
known in the art and may be used. 

In another embodiment, the protein of SEQ ED NO:283 or 286 or a fragment or derivative thereof 
may be administered to a subject to diagnose, treat or prevent an immune disorder associated with 
decreased expression or activity of the protein of the invention. Such disorders can include, but are not 

10 limited to, acquired immunodeficiency syndrome (AIDS), X-linked agammaglobinemia of Bruton, 
common variable immunodeficiency (CVI), DiGeorge's syndrome (thymic hypoplasia), thymic dysplasia, 
isolated IgA deficiency, severe combined immunodeficiency disease (SCID), immunodeficiency with 
thrombocytopenia and eczema (Wiskott-Aldrich syndrome), Chediak-Higashi syndrome, chronic 
granulomatous diseases, hereditary angioneurotic edema, immunodeficiency associated with Cushing's 

15 disease, Addison's disease, adult respiratory distress syndrome, allergies, ankylosing spondylitis, 
amyloidosis, anemia, asthma, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, 
bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes 
mellitus, emphysema, episodic lymphopenia with lymphocytotoxins, erythroblastosis fetalis, erythema 
nodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome, gout, Graves' disease, 

20 Hashimoto's thyroiditis, hypereosinophilia, irritable bowel syndrome, multiple sclerosis, myasthenia 
gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, 
psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, Sjogren's syndrome, systemic 
anaphylaxis, systemic lupus erythematosus, systemic sclerosis, thrombocytopenic purpura, ulcerative 
colitis, uveitis, Werner syndrome, complications of cancer, hemodialysis, and extracorporeal circulation, 

25 leukemias such as multiple myeloma, and lymphomas such as Hodgkin's disease. In addition, such 
disorders associated with decreased protein expression or activity can be treated by administering to a 
patient polynucleotide sequences encoding the protein of the invention, e.g. inserted in an appropriate 
vector. In another example, a compound that increases either the activity of the protein of the invention 
or their expression can be administered to a patient to treat or prevent any of the diseases mentioned 

30 above. 

In a further embodiment, an antagonist of the protein of the invention may be administered to a 
subject to treat or prevent an immune disorder associated with increased expression or activity of the 
protein of SEQ ID NO:283 or 286 including, but not limited to, auto-immune deseases or graft rejection. 
In one aspect, an antibody which specifically binds the protein of the invention may be used directly as an 
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antagonist or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells 
or tissues which express the proteins of the invention, such as the salivary gland tissue or the bone 
marrow tissue. In addition, sense, antisense nucleotides, GSE, ribozymes, specific protein inhibitors such 
as antibodies or small coumpounds can be administered to inhibit the expression of the proteins of the 
5 invention. 

In another embodiment, an antagonist of the protein of SEQ ID NO:283 may be administered to a 
subject to treat or prevent a cell proliferative disorder. Such disorders may include, but are not limited to, 
arteriosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), 
myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary 

10 thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, 
sarcoma, teratocarcinoma, and, in particular, cancers of the adrenal gland, bladder, bone, bone marrow, 
brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, 
pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus. In 
one aspect, an antibody which specifically binds the protein of the invention may be used directly as an 

1 5 antagonist or indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells 
or tissue which express the protein of the invention. In another example, sense, antisense nucleotides, 
GSE, or ribozymes designed from nucleotides of the invention can be administered to inhibit the 
expression of the protein of the invention. 

Protein of SEP ID NO: 411 (internal designation 181-10-1 -0-C9-CS) 

20 The protein of SEQ ID NO: 41 1 encoded by the cDNA of SEQ ID No: 170 is highly expressed in 

fetal liver. The protein of the invention is homologous to peripheral benzodiazepine receptor/isoquinoline 
binding protein (PBR/IBP) of human, bovine and murine origin (Genbank accession numbers M36035, 
M64520 and L17306 respectively). The 170-amino-acid protein of SEQ ID NO: 41 1 is similar in size and 
hydropathicity to known peripheral PBR/IBP benzodiazepine receptors/isoquinoline binding proteins. 

25 Like the known peripheral benzodiazepine receptors/isoquinoline binding proteins, the protein of the 
subject invention has about five potential transmembrane domains at positions 3-23, 45-65, 82-102, 105- 
125 and 130-150. Moreover, the protein of the invention displays a stretch of 11 amino acids (starting 
with V144 and ending with R154) that corresponds to a recently identified putative cholesterol 
recognition/interaction amino acid consensus pattern (-L/V-(X)(l-5)-Y-(X)(l-5)-R/K-) [See Li et al, 

30 Endocrinology 1998 Dec; 139 (12): 4991-7]. 

The peripheral benzodiazepine receptor (PBR) is a 18-kDa protein containing binding sites for 
benzodiazepine and is distinct from the GAB A neurotransmitter receptor [Papadopoulos, V. (1993) 
Endocr. Rev. 14: 222-240]. Expression of PBR has been found in every tissue examined. However, it is 
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most abundant in steroidogenic cells and is also found, primarily, on outer mitochondrial membranes 
[Anholt, R et al. (1986) J. Biol. Chem. 261:576-583]. PBR is thought to be associated with a multimeric 
complex composed of the 18-kDa isoquinoline binding protein and the 34-kDa pore-forming voltage 
dependent anion channel protein, preferentially located on the outer/inner mitochondrial membrane 
5 contact sites [McEnery, M.W. et al Proc. Natl Acad. Sci. USA. 89:3170-3174; Gamier, M. et al (1994) 
Mol Pharmacol 45:201-211; Papadopoulos, V. et al (1994) Mol Cel. Endocr. 104:R5-R9]. Drug 
ligands of PBR, upon binding to the receptor, simulate steroid synthesis in steroidogenic cells in vitro 
[Papadopoulos, V et al (1990) J. Biol Chem. 265: 3772-3779; Barnea, E. R. et al. (1989) Mol Cell 
Endocr. 64: 155-159; Amsterdam, A. and Suh, B.S. (1991) Endocrinology 128: 503-510]. Likewise, in 

10 vivo studies showed that high affinity PBR ligands increase steroid plasma levels in hypophysectomized 
rats [Amri, H. et al (1996) Endocrinology 137:5707-5718]. Further in vitro studies on isolated 
mitochondria provided evidence that PBR ligands, drug ligands, or the endogenous PBR ligand (the 
polypeptide diazepam-binding inhibitor (DBI) [Papadopoulos, V. et al (1997) Steroids 62: 21-28]) 
stimulate pregnenolone formation by increasing the rate of cholesterol transfer from the outer to the inner 

15 mitochondrial membrane [for review, see Culty, M. et al. (1999) Journal of Steroid Biochemistry and 
Molecular Biology 69: 123-130]. 

Based on the amino acid sequence of the 18-kDa PBR, a three dimensional model was developed 
[Papadopoulos, V. (1996) In: The Leydig Cell. Payne, A. H. et al (eds) Cache River Press, IL, pp 596- 
628]. This model was shown to accommodate a cholesterol molecule and function as a channel, 

20 supporting the role of PBR in cholesterol transport. The role of PBR in steroidogenesis was also 
demonstrated by observing that PBR negative cells generated by homologous recombination failed to 
produce steroids [Papadopoulos, V. et al (1997) J. Biol Chem. 272: 32129-32135]. Further, cholesterol 
transport experiments in bacteria expressing the 1 8-kDa PBR protein provided definitive evidence for a 
function as a cholesterol channel/transporter [Papadopoulos, V. et al (1997) supra]. 

25 In addition to its role in mediating cholesterol movement across membranes, PBR has been 

implicated in several other physiological functions, including cell growth and differentiation, chemotaxis, 
mitochondrial physiology, porphyrin and heme biosynthesis, immune response, anion transport and 
GABAergic regulation of CNS. [for review, see Gavish, M. et al (1999) Pharmaceutical Reviews 51: 
629-650; Beurdeley-Thomas, A. et al (2000) Journal of Neuro-Oncology 46: 45-56]. Also, a recent 

30 report also indicates that PBR agonists are potent anti-apoptotic compounds. These findings suggest that 
this effect may represent a major function for this receptor (Bono, F. et al (1999) Biochemical and 
Biophysical Research Communications 265:457-461]. 

It appears that PBR is associated with stress and anxiety disorders. It has been suggested that 
PBRs play a role in the regulation of several stress systems such as the HPA axis, the sympathetic 
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nervous system, the renin-angiotensin axis, and the neuroendocrine axis. In these systems, acute stress 
typically leads to increases in PBR density, whereas chronic stress typically leads to decreases in PBR 
density. Furthermore, in Generalized Anxiety Disorder (GAD), Panic Disorder (PD), Generalized Social 
Phobia (GSP), and Post-Traumatic Stress Disorders (PTSD), PBR density is typically decreased in 
5 platelets. 

In the brain, where PBRs are associated with glial cells, PBRs are increased in specific brain areas 
in neurodegenerative disorders and also after neurotoxic and traumatic-ischemic brain damage [for 
review, see Gavish, M. et al. (1999) supra]. The literature also reports a decrease in peripheral-type 
benzodiazepine receptors in postmortems of chronic schizophrenics, suggesting that the decreased density 

10 of PBRs in the brain may be involved in the pathophysiology of schizophrenia. Increased levels of PBR 
in autopsied brain tissue from PSE patients (Portal-Systemic Encephalopathy patients) have been 
reported, thus supporting the theory that activation of PBR contributes to the pathogenesis characteristic 
of portal-systemic encephalopathy (PSE) in the central nervous system [Kurumaji, A. et al. (1997) J. 
Neural Transm 104:1361-1370; Butterworth R. F. (2000) Neurochemistry International 36: 411-416]. 

15 In addition to its involvement in the neurological disorders discussed supra, PBR has been 

implicated in the regulation of tumor cell proliferation [for review, see Gavish, M. et al. (1999) supra; 
Beurdeley-Thomas, A. et al (2000) supra; Hardwick, M. (1999) Cancer Research 59:831-842; Venturini, 
I. et al (1998) Life Sci 63:1269-80; Carmel I et al (1999) Biochem Pharmacol 58: 273-8]. The 
invasiveness and metastatic ability of human breast tumor cells is proportional to the level of PBR 

20 expressed. Further, PBR has been proposed to be used as a tool/marker for detection, diagnosis, 
prognosis and treatment of cancer [WO 99/49316, hereby incorporated by reference in its entirety]. 

Many ligands have been described that bind to peripheral benzodiazepine receptor with various 
affinities. Some benzodiazepines, Ro 5-4864 [4-chlorodiazepam], diazepam and structurally related 
compounds, are potent and selective PBR ligands. Exogenous ligands also include 2-phenylquinoline 

25 carboxamides (PK11195 series), imidazo [l,2-a]pyridine-3-acetamides (Alpidem series) and pyridazine 
derivatives. Some endogenous compounds, including porphyrins and diazepam binding inhibitor (DBI), 
bind to PBR with nanomolar and micromolar affinity [for review, see Gavish, M. et al. (1999) supra; 
Beurdeley- Thomas, A. et al. (2000) supra]. 

The protein of SEQ ID NO: 411 is a novel peripheral -type benzodiazepine receptor. As such, it is 

30 serves a channel function that mediates cholesterol movement across membranes, play a role in 
steroidogenesis, cell growth and differentiation, chemotaxis, mitochondrial physiology, protection against 
apoptosis, porphyrin and heme biosynthesis, immune response, anion transport and GABAergic 
regulation of CNS. 
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In one embodiment, a preferred polypeptide of the invention comprises the amino acids of SEQ 
ID NO: 411 from position 144 to 154. In another embodiment, the subject invention provides a 
polypeptide comprising the sequence of SEQ ID NO: 411. Other preferred polypeptides of the invention 
include biologically active fragments of SEQ ID NO: 411. Biologically active fragments of the protein of 
5 SEQ ID NO: 411 have any of the biological activities described herein which are associated with the 
PBR. In another embodiment, the polypeptide of the invention is encoded by clone 181-10-1-0-C9-CS. 

One aspect of the subject invention provides compositions and methods using the protein of the 
invention, or biologically active fragments thereof, for the development, identification, and/or selection of 
agents capable of modulating the expression or activity of the protein of the invention. 

10 Agents which modulate the activity of the PBR/IBP of the subject invention include, but are not 

limited to, antisense oligonucleotides, ribozymes, drugs, and antibodies. These agents may be made and 
used according to methods well known in the art. Also, the protein of the invention, or biologically active 
fragments thereof, may be used in screening assays for therapeutic compounds. A variety of drug 
screening techniques may be employed. In this aspect of the invention, the protein or biologically active 

15 fragment thereof, may be free in solution, affixed to a solid support, recombinantly expressed on, or 
chemically attached to, a cell surface, or located intracellularly. The formation of binding complexes, 
between the protein of the invention, or biologically active fragments thereof, and the compound being 
tested, may then be measured. 

In one embodiment, the subject method utilizes eukaryotic or prokaryotic host cells which are 

20 stably transformed with recombinant nucleic acids expressing the PBR/IBP polypeptide or biologically 
active fragments thereof. The transformed cells may be viable or fixed. Drugs or compounds which are 
candidates for the modulation of the PBR/IBP, or biologically active fragments thereof, are screened 
against such transformed cells in binding assays well known to those skilled in the art. Alternatively, 
assays such as those taught in Geysen H. N., WO Application 84/03564, published on Sep. 13, 1984, and 

25 incorporated herein by reference in its entirety, may be used to screen for peptide compounds which 
demonstrate binding affinity for, or the ability to modulate, the PBR/IBP, or biologically active fragments 
thereof. In another embodiment, competitive drug screening assays using neutralizing antibodies 
specifically compete with a test compound for binding to the PBR/IBP protein of the invention, or 
biologically active fragments thereof. 

30 Another embodiment of the subject invention provides compositions and methods of selectively 

modulating the expression or activity of the protein of the invention. Modulation of the PBR/IBP would 
allow for the successful treatment and/or management of diseases or biochemical abnormalities associated 
with the PBR or PBR/IBP. Antagonists, able to reduce or inhibit the expression or the activity of the 
protein of the invention, would be useful in the treatment of diseases associated with elevated levels of the 
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PBR/IBP, increased cell proliferation, or increased cholesterol transport. Thus, the subject invention 
provides methods for treating a variety of diseases or disorders, including, but not limited to, cancers, 
especially liver cancer, and portal-systemic encephalopathy. 

Alternatively, the subject invention provides methods of treating diseases or disorders associated 
5 with decreased levels of the protein of the PBR/IBP. Thus, the subject invention provides methods of 
treating diseases including, and not limited to, schizophrenia, chronic stress, GAD, PD, GSP and PTSD. 
Other diseases which may be treated by agonists of the PBR/IBP of the subject invention include those 
diseases associated with decreases in cell proliferation, e.g. developmental retardation. 

Furthermore, because the PBR/IBP of the subject invention is also able to transport cholesterol 

10 into cells, the subject invention may also be used to increase cholesterol transport into cells. Diseases 
associated with cholesterol transport deficiencies include lipoidal adrenal hyperplasia, and diseases where 
there is a requirement for increased production of compounds requiring cholesterol such as myelin and 
myelination, such as Alzheimer's disease, spinal chord injury, and brain development neuropathy [Snipes, 
G. and Suter, U. (1997) Cholesterol and Myelin. In: Subcellular Biochemistry, Robert Bittman (ed.), vol 

15 28, pp. 173-204, Plenum Press, New York]. The methods of treating disorders associated with decreased 
levels of PBR/IBP may be practiced by introducing agonists which stimulate the expression or the activity 
of the protein of the invention. 

In one embodiment, methods of increasing the levels of PBR/IBP in tissues or cell types may be 
practiced by utilizing nucleic acids encoding the protein of the subject invention, or biologically active 

20 fragments thereof, to introduce biologically active polypeptide into targeted cell types. Vectors useful in 
such methods are known to those skilled in the art as are methods of introducing such nucleic acids into 
target tissues. 

. Agents which stimulate or inhibit the activity of the protein of the invention include but are not 
limited to agonist and antagonist drugs respectively. These drugs can be obtained using any of a variety of 

25 drug screening techniques as discussed above. 

Antagonists of the PBR/IBP encoded by SEQ ID NO: 170 include agents which decrease the 
levels of expressed mRNA encoding the protein of SEQ ID NO: 41 1. These include, but are not limited 
to, RNAi, one or more ribozymes capable of digesting the protein of the invention mRNA, or antisense 
oligonucleotides capable of hybridizing to mRNA encoding the PBR/IBP of SEQ ID NO: 41 1 Antisense 

30 oligonucleotides can be administrated as DNA, as DNA entrapped in proteoliposomes containing viral 
envelope receptor proteins [Kanoda, Y. et al. (1989) Science 243: 375] or as part of a vector which can be 
expressed in the target cell and provide antisense DNA or RNA. Vectors which are expressed in particular 
cell types are known in the art. Alternatively, the DNA can be injected along with a carrier. A carrier can 
be a protein such as a cytokine, for example interleukin 2, or polylysine-glycoprotein carriers. Carrier 
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proteins, vectors, and methods of making and using polylysine carrier systems are known in the art. 
Alternatively, nucleic acid encoding antisense molecules may be coated onto gold beads and introduced 
into the skin with, for example, a gene gun [Ulmer, J.B. et al. (1993) Science 259:1745]. 

Antibodies, or other polypeptides, capable of reducing or inhibiting the activity of PBR/IBP may 
5 be provided as in isolated and substantially purified form. Alternatively, antibodies or other polypeptides 
capable of inhibiting or reducing the activity of the PBR/IBP protein, may be recombinantly expressed in 
the target cell to provide a modulating effect. In addition, compounds which inhibit or reduce the activity 
of the PBR/IBP protein of the subject invention may be incorporated into biodegradable polymers being 
implanted in the vicinity of where drug delivery is desired. For example, biodegradable polymers may be 

10 implanted at the site of a tumor or, alternatively, biodegradable polymers containing antagonists/agonists 
may be implanted to slowly release the compounds systemically. Biodegradable polymers, and their use, 
are known to those of skill in the art (see, for example, Brem et al. (1991) J. Neurosurg. 74:441-446. 

In another embodiment, the invention provides methods and compositions for detecting the level 
of expression of the mRNA of the protein of the invention. Quantification of mRNA levels of the 

15 PBR/IBP protein of the invention may be useful for the diagnosis or prognosis of diseases associated with 
an altered expression of the protein of the invention. Assays for the detection and quantification of the 
mRNA of the protein of the invention are well known in the art (see, for example, Maniatis, Fitsch and 
Sambrook, Molecular Cloning; A Laboratory Manual (1982), or Current Protocols in Molecular Biology; 
Ausubel, F.M. et al. (Eds), Wiley & Sons, Inc.). 

20 Polynucleotides probes or primers for the detection of the mRNA of the protein of SEQ ID NO: 

41 1 can be designed from the cDNA of SEQ ID NO: 170. Methods for designing probes and primers are 
known in the art. In another embodiment, the subject invention provides diagnostic kits for the detection 
of the mRNA of the protein of the invention in cells. The kit comprises a package having one or more 
containers of oligonucleotide primers for detection of the protein of the invention in PCR assays or one or 

25 more containers of polynucleotide probes for the detection of the mRNA of the protein of the invention 
by in situ hybridization or Northern analysis. Kits may, optionally, include containers of various reagents 
used in various hybridization assays. The kit may also, optionally, contain one or more of the following 
items: polymerization enzymes, buffers, instructions, controls, or detection labels. Kits may also, 
optionally, include containers of reagents mixed together in suitable proportions for performing the 

30 hybridization assay methods in accordance with the invention. Reagent containers preferably contain 
reagents in unit quantities that obviate measuring steps when performing the subject methods. 

In another embodiment, the invention relates to methods and compositions for detecting and 
quantifying the level of the protein of the invention present in a particular biological sample. These 
methods are useful for the diagnosis or prognosis of diseases associated with an altered levels of the 
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protein of the invention. Diagnostic assays to detect the protein of the invention may comprise a biopsy, 
in situ assay of cells from organ or tissue sections, or an aspirate of cells from a tumor or normal tissue. In 
addition, assays may be conducted upon cellular extracts from organs, tissues, cells, urine, or serum or 
blood or any other body fluid or extract. 
5 Assays for the quantification of the PBR/IBP of SEQ ID NO: 41 1 may be performed according to 

methods well known in the art. Typically, these assays comprise contacting the sample with a ligand of 
the protein of the invention or an antibody (polyclonal or monoclonal) which recognizes the protein of the 
invention or a fragment thereof, and detecting the complex formed between the protein of the invention 
present in the sample and the ligand or antibody. Fragments of the ligands and antibodies may also be 

10 used in the binding assays, provided these fragments are capable of specifically interacting with the 
BRP/IRP of the subject invention. Further, the ligands and antibodies which bind to the BRP/IRP of the 
invention may be labeled according to methods known in the art. Labels which are useful in the subject 
invention include, but are not limited to, enzymes labels, radioisotopic labels, paramagnetic labels, and 
chemiluminescent labels. Typical techniques are described by Kennedy, J. H., et al. (1976) Clin. Chim. 

15 Acta 70:1-31; and Schurs, A. H. et al (1977) Clin. Chim. Acta 81: 1-40. 

The subject invention also provides methods and compositions for the identification of metastatic 
tumor masses. In this aspect of the invention, the polypeptides and antibodies which bind the 
polypeptides of the invention may be used as a marker for the identification of the metastatic tumor mass. 
Metastatic tumors which originated from the liver may overexpress the PBR/IBP of SEQ ID NO: 411, 

20 whereas newly forming tumors, or those originating from other tissues are not expected to bear the 
PBR/IBP of SEQ ID NO: 41 1 . 

Protein of SEQ ID NO: 397 (internal designation 160-28-4-0-C4-CS). 

The protein of SEQ ID NO: 397, encoded by the cDNA of SEQ ID NO: 156 (clone 160-28-4-0- 
C4-CS), exhibits homology to the ADP-ribosylation factors (ARF) family of proteins. The ARF family 

25 includes ADP-ribosylation factors (ARFs) and ARF-like proteins (ARLs); the ARF family of proteins is 
one family of the Ras superfamily. Proteins belonging to the Ras superfamily have molecular weights of 
18-30 kDa and function in a variety of cellular processes including, but not limited to, signaling, growth, 
immunity, and protein transport. 

ARFs are monomeric GTP-binding proteins, related structurally to both G protein alpha-subunits 

30 and Ras proteins. ARF family members share more than 60% sequence identity, appear to be ubiquitous 
in eukaryotes, and are evolutionarily highly conserved throughout. Immunologically, they have been 
localized to the Golgi apparatus of several types of cells (Stearns et al. Proc. Natl. Acad. Sci. (USA) 
87:1238-1242 (1990)). ARF proteins enhance the ADP-ribosyltransferase activity of cholera toxin as an 
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allosteric activator (Noda et al. Biochim. Biophys. Acta 1034: 195-199 (1990)). ARFs have also been 
shown to act as regulatory molecules, or "switches", for linking two processes (e.g., the process of vesicle 
fission from a donor compartment and fusion with an acceptor compartment (Rothman, J. E. and 
Wieland, F. T. Science 272: 227-234 (1996)). ARF family members fall into three classes, classes I-III, 
5 according to their size and sequence homology. Class I comprises ARF1, ARF2, and ARF3; Class II 
comprises ARF4 and ARf 5; and Class III comprises ARF6. 

The classes occupy different subcellular locations and have been implicated in different transport 
pathways. Class I ARFs localize to the Golgi where they are involved in the regulation of ER-Golgi and 
intra-Golgi transport. Class I ARFs are also involved in the recruitment of cytosolic coat proteins to Golgi 

10 membranes during the formation of transport vesicles. Class III (e.g., ARF6) localizes to a tubulovesicular 
compartment, secretory granules, and the plasma membrane, where it is involved in regulated secretion 
and recycling. Class II ARFs appear to be cytosolic, but their role has not been elucidated. (Radhakrishna, 
H. and Donaldson, J. G. J. Cell Biol. 139: 49-61(1997)). 

ARF function, in general, is regulated by a GDP-GTP cycle. For example, ARF1 is cytosolic in 

15 the GDP bound state, but is associated with membranes when in the GTP bound state. A guanine 
nucleotide exchange factor (GEF) in the donor compartment recruits ARF1 to the membrane. At the 
membrane, GTP -ARF 1 recruits coat proteins, which assemble together into spherical coats, budding off 
vesicles in the process. After budding, hydrolysis of bound GTP causes ARF1 to dissociate from the 
membrane. ARF1 dissociation causes the coat to become unstable and dissociate as well. (Rothman, 

20 supra.) 

Members of the ARF multigene family, when expressed as recombinant proteins in E. coli, 
display different phospholipid and detergent requirements (Price, et al. J. Biol. Chem. 267: 17766-17772 
(1992)). Some lipids and/or detergents, e.g., SDS, cardiolipin, dimyristoylphosphatidylcholine 
(DMPC)/cholate, enhance ARF activities (Bobak, et al. Biochemistry 29:855-861 (1990); Noda, et al. 

25 Biochim. Biophys. Acta 1034: 195-199 (1990); Tsai, et al. J. Biol. Chem. 263:1768-1772 (1988)). ARFs 
also activate phospholipase D (PLD), a membrane-bound enzyme implicated as an effector of several 
growth factors (Boman, A. L. and Kahn, R. A. Trends Biochem. Sci. 20: 147-150 (1995). PLD1 has been 
shown to be activated by a variety of G-protein regulators, for example, PKC (protein kinase C) and 
ADP-ribosylation factor (ARF). PKC and ARFs may regulate G-proteins either individually or together 

30 in a synergistic manner. Recently the role of ARFs in microtubules formation has also been 
demonstrated. ADP-ribosylation of tubulin almost completely blocked self-assembly of this protein in 
brain (Terashima M. et a; J.Nutr Sci Vitaminol 45: 393-400 (1999)). 

In general, differences in the various ARF sequences are concentrated in the amino-terminal 
regions and the carboxyl portions of the proteins. Only three of 17 amino acids in the amino termini have 
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shown to be identical among ARFs, and four amino acids in this region of ARFs 1-5 are missing in ARF 
6 (Tsuchiya, et al. J. Biol. Chem. 266: 2772-2777 (1991)). It was reported (Kahn, et al. J. Biol. Chem. 
267:13039-13046 (1992)) that the amino-terminal regions of ARF proteins form an alpha-helix and that 
this domain is required for membrane targeting, interaction with lipid, and ARF activity. 
5 Schliefer et al, (J. Biol. Chem. 257: 20-23 (1991)) have described a protein distinctly larger than 

ARF that possessed ARF-like activity. ARF -like proteins, or ARLs, have been found in different species. 
Some of ARLs appear to lack ADP-ribosyltransferase-enhancing activity; ARLs may differ in GTP- 
binding requirements and GTPase activity as compared to various ARF isoforms. For example, ARP, a 
mammalian ARL, is 33-39% identical to members of the ARF family; ARP, however, differs from other 

10 ARF family proteins by virtue of its ability to hydrolyze bound GTP in the absence of other proteins. 
ARP protein, unlike ARFs, is typically associated with plasma membrane instead of the cytosol 
(Schurmann, A. J. Biol. Chem. 270, 30657-30663 (1995)). 

ARF family members have been implicated in several disease processes, such as Lowe's 
syndrome, an X-linked disorder characterized by congenital cataracts, renal tubular dysfunction and 

15 neurological deficits. These disorders may be due to an inability to recruit ARF to the Golgi membrane 
(Suchy, S. F. et al. Hum. Mol. Genet. 4: 2245-2250 (1995), Londono I. et al. Kidney Int. 55: 1407-1416 
(1999)). It has also been suggested that regulation of ARF is also involved in cystic fibrosis, Dent's 
disease, diabetes, and autosomal dominant polycystic kidney disease (Marshansky, V., et al. 
Electrophoresis 18: 2661-2676 (1997)). 

20 The new human ARF-related protein of SEQ ID NO:397, encoded by clone 160-28-4-0-C4-CS in 

one embodiment, and the related polynucleotides, provide new compositions which are useful in the 
diagnosis, treatment, and prevention of secretory, exocytosis, endocytosis and another "sorting disorders." 

The subject invention provides a polypeptide comprising the amino acid sequence of SEQ ID 
NO: 397 or clone 160-28-4-0-C4-CS, or biologically active fragments thereof. The intact protein of 

25 interest is 173 amino acids in length, has an ARF family amino acid motif (Pfam), and has ATP/GTP- 
binding site motif A P-loop (PS00017). The protein of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS also 
has chemical and structural similarity with human ARL1 (P40616), ARD-1 (R66033) and ARF6 (GI 
178989) (31%, 31% and 27% identity, respectively). The amino acid length of SEQ ID NO: 397 is 
similar to those of the aforementioned ARFs Biologically active fragments of SEQ ID NO: 397 have one 

30 or more of the biological activities typically associated the full length protein. In one embodiment, the 
protein is encoded by clone 160-28-4-0-C4-CS 

The invention also provides variants of the protein of SEQ ID NO: 397 or clone 160-28-4-0-C4- 
CS. The variants have at least about 80%, more preferably at least about 90%, and most preferably at 
least about 95% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 397 or clone 



383 



G-078US05DIV 



160-28-4-0-C4-CS. Variants according to the subject invention have at least one functional and/or 
structural characteristic of ARFs. The invention also provides biologically active fragments of the variant 
proteins. 

The invention includes those polynucleotides encoding the protein of SEQ ID NO: 397 or clone 
5 160-28-4-0-C4-CS, variants of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS, and biologically active 
fragments of both the protein of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS and variants thereof. As is 
apparent to those skilled in the art, a variety of different DNA sequences can encode the amino acid 
sequence of the proteins, variants, and biologically active fragments of said proteins and variants. It is 
well within the skill of a person trained in the art to create these alternative DNA sequences encoding 

10 proteins having the same, or essentially the same, amino acid sequence. These variant DNA sequences 
are also within the scope of the subject invention. As used herein, reference to "essentially the same" 
sequence refers to sequences that have amino acid substitutions, deletions, additions, or insertions that do 
not materially affect biological activity. 

The subject invention provides method of treating cytoskeletal, secretory, and inflammatory 

15 disorders/conditions comprising the administration of therapeutically effective amounts of a composition 
comprising the protein of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS. These methods can also be 
practiced using variants of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS, or biologically active fragments 
of either SEQ ID NO: 397 or clone 160-28-4-0-C4-CS, or variants of SEQ ID NO: 397 or clone 160-28- 
4-0-C4-CS. Disorders/conditions which can be treated by the subject invention include, but are not 

20 limited to, prostate cancer, brain and another tumors, Lowe's syndrome, glomerulonephritis, chronic 
glomerulonephritis, tubulointerstitial nephritis, inherited X-linked nephrogenic diabetes insipidus, 
autosomal dominant polycystic kidney disease (ADPKD), herpes gestationis, dermatitis herpetiformis, 
lupus erythematosus, Crohn's disease, irritable bowel syndrome and Addison's disease; 
secretory/endocytotic disorders such as cystic fibrosis, glucose-galactose malabsorption syndrome, 

25 hypercholesterolemia, hyper- and hypoglycemia, Grave's disease, goiter, and Cushing's disease; 
conditions associated with abnormal vesicle trafficking, including acquired immunodeficiency syndrome 
(AIDS); allergies including hay fever, asthma, and urticaria (hives); autoimmune hemolytic anemia; 
multiple sclerosis; myasthenia gravis; rheumatoid and osteoarthritis; Chediak-Higashi and Sjogren's 
syndromes; toxic shock syndrome; traumatic tissue damage; viral, bacterial, fungal, helminthic, and 

30 protozoal infections. 

In another embodiment, a vector capable of expressing the protein of SEQ ID NO: 397 or clone 
160-28-4-0-C4-CS, or biologically active fragments thereof, can be administered to a subject to treat or 
prevent disorders including, but not limited to, those described above. Alternatively, the vector can 
encode a variant, or biologically active fragment of the variant protein. Multiple vectors encoding any 
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combination of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS, variants, and/or biologically active 
fragments of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS and/or variants can be administered to a 
subject. 

In a further embodiment, a pharmaceutical composition comprising a substantially purified 
5 protein of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS (and/or biologically active fragments thereof), in 
conjunction with a suitable pharmaceutical carrier, can be administered to a subject to treat or prevent the 
above mentioned disorders. Alternatively, a pharmaceutical composition comprising a substantially 
purified variant protein of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS (and/or biologically active 
fragments thereof), in conjunction with a suitable pharmaceutical carrier, can be administered in the 

10 aforementioned therapeutic regimens. As would be apparent to the skilled artisan, any therapeutically 
effective combination of the protein encoded by SEQ ID NO: 397 or clone 160-28-4-0-C4-CS (and/or 
biologically active fragments thereof) and variants of SEQ ID NO:397 or clone 160-28-4-0-C4-CS 
(and/or biologically active fragments thereof), in conjunction with a suitable pharmaceutical carrier can 
be used in the aforementioned therapeutic regimens. 

15 ARFs are known to be involved in regulated transport of vesicles. Therefore, in another 

embodiment, the protein of SEQ ID No: 397 or clone 160-28-4-0-C4-CS, variants, and/or biologically 
active fragments of said proteins and/or variants can be used as a component of drug delivery vehicles 
such as colloids or liposomes. The protein of SEQ ID NO: 397 or clone 160-28-4-0-C4-CS, variants, 
and/or biologically active fragments of said proteins and/or variants can be incorporated into the lipid 

20 membranes of liposomes and can serve as specific targeting agents. The methods of design of such drug 
delivery systems is known by those skilled in the art and can be practiced according to conventional 
pharmaceutical principles (Smith H.J.. Introduction to the principles of drug design and action, 3 rd ed. 
(1998); Chien Y.W. Novel Drug Delivery systems, 2 nd ed. (1992); Storm G. et al J.Liposome Res. 4: 641- 
666 (1994); and Crommelin DJ.A. et al. Adv. Drug Delivery Rev. 17 : 49-60 (1995)). 

25 In another embodiment of the invention, the polynucleotides encoding the protein of SEQ ID NO: 

397 or clone 160-28-4-0-C4-CS can be used for therapeutic purposes. Polynucleotides encoding 
fragments of the protein of SEQ ID NO:397 or clone 160-28-4-0-C4-CS can also be used in therapeutic 
regiments. In one aspect, the complement of the polynucleotide encoding the protein of SEQ ID 
NO.: 397 or clone 160-28-4-0-C4-CS can be used in situations in which it would be desirable to block the 

30 transcription of the mRNA. Modifications of gene expression can be obtained by designing 
complementary sequences or antisense molecules (DNA, RNA, or PNA) to the control, 5', or regulatory 
regions of the gene encoding the protein of interest. Such technology is now well known in the art, and 
sense or antisense oligonucleotides or larger fragments can be designed from various locations along the 
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coding or control regions of sequences encoding the protein of interest. Methods of treatment utilizing 
antisense technology are also well known to those skilled in the art. 

Another embodiment of the invention provides methods of assessing PLD modulation by using 
ARF properties of the protein of interest. 
5 In another embodiment, antibodies which specifically bind the protein of SEQ ID NO: 397 or 

clone 160-28-4-0-C4-CS can be used for the diagnosis of disorders characterized by expression of the 
protein, or in assays to monitor patients being treated with the protein of interest. Methods of making 
both polyclonal and monoclonal antibodies are well-known in the art. Diagnostic assays which can be 
used in this aspect of the invention include, and are not limited to, ELISAs, RIAs, arid FACS, and are 
10 well known in the art. These assays also provide a basis for diagnosing or identifying altered or abnormal 
levels of SEQ ID NO:397 or the polypeptides encoded by the human cDNA of clone 160-28-4-0-C4-CS 
expression as compared to normal individuals. These screening methods are, likewise, well known to the 
skilled artisan. 

In another embodiment of the invention, the protein of interest, its catalytic or immunogenic 

15 fragments, or oligopeptides thereof can be used for screening libraries of compounds in any of a variety of 
drug screening techniques. The fragment employed in such screening can be free in solution, affixed to a 
solid support, recombinantly expressed on, or chemically attached to, a cell surface, or located 
intracellularly. The formation of binding complexes between the protein of interest and the agent being 
tested can be measured by methods well known to those skilled in the art. Another technique for drug 

20 screening provides for high throughput screening of compounds having suitable binding affinity to the 
protein of interest. (See, e.g., Geysen, et al. (1984) PCT application WO84/03564.) 

In another embodiment of the invention, the polynucleotides encoding the protein of interest can 
be used for diagnostic purposes. The polynucleotides can be used to detect and quantify gene expression 
in biopsied tissues in which expression of the protein of interest can be correlated with a disease or 

25 condition. Such diagnostic assays are well known in the art and can be used to monitor regulation of the 
protein of interest levels during therapeutic intervention and/or to determine absence, presence, and 
excess expression of the protein of interest. Examples of such conditions and disorders have been 
provided supra. The polynucleotide sequences encoding the protein of interest can be used, for example, 
in Southern or Northern analyses, dot blot, or other membrane-based technologies; in PCR technologies; 

30 in dipstick, pin, and ELISA assays; and in microarrays utilizing fluids or tissues from patients to detect 
altered expression of the protein of SEQ ID NO:397 or clone 160-28-4-0-C4-CS. Such qualitative or 
quantitative methods are well known in the art. 

In further embodiments, oligonucleotides or longer fragments derived from any of the 
polynucleotide sequences described herein can be used as targets in a microarray. The microarray can be 
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used to monitor the expression level of large numbers of genes simultaneously and to identify genetic 
variants, mutations, and polymorphisms. This information can be used to determine gene function, to 
understand the genetic basis of a disorder, to diagnose a disorder, and to develop and monitor the 
activities of therapeutic agents. Microarrays can be prepared, used, and analyzed using methods known in 
5 the art. (See, e.g., Brennan, T. M. et al. (1995) U.S. Pat. No. 5,474,796; Heller, R. A. et al. (1997) Proc. 
Natl. Acad. Sci. 94: 2150-2155; and Heller, M. J. et al. (1997) U.S. Pat. No. 5,605,662.) 

Another embodiment of the subject invention provides nucleic acid sequences encoding the 
protein of interest which can be extended utilizing a partial nucleotide sequence and various PCR-based 
methods. This aspect of the invention provides methods for the detection of upstream sequences, such as 
10 promoters and regulatory elements. Methods of practicing this aspect of the invention are also well 
known in the art. 

In other embodiments of the disclosed therapeutic regimens, any of the proteins, variants, 
biologically active fragments, antibodies, complementary sequences, or vectors of the invention can be 
administered in combination with other appropriate therapeutic agents. Selection of the appropriate agents 
15 for use in combination therapy can be made by one of ordinary skill in the art. The combination of 
therapeutic agents can act synergistically to effect the treatment or prevention of the various disorders 
described above. In particular, purified protein can be used to produce antibodies or to screen libraries of 
pharmaceutical agents to identify those which specifically bind the protein of interest. Neutralizing 
antibodies especially preferred for therapeutic use. 

20 Protein of SEQ ID NO: 287 (internal designation 174-5-3-0-H7-CS) 

The protein of SEQ ID NO: 287, encoded by human cDNA of SEQ ID NO: 46 (clone 174-5-3-0- 
H7-CS), is highly homologous (more than 99% identity in amino acids) to the human protein encoded by 
the CLN8 gene listed in Genbank under accession number AF 123757. The two proteins differ by two 
conservative amino-acid substations (alanine for valine at position 155 and serine for asparagine at 

25 position 225). In addition, the protein encoded by 174-5-3-0-H7-CS contains seven transmembrane 
domains. These domains are located at amino acids 25-45, 71-91, 100-120, 133-153, 160-180, 205-225, 
and 228-248 as predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. Notes, 
10:685-686 (1994)). The protein encoded by SEQ ID NO: 287 also exhibits a signal peptide at positions 
1-50 and a retention signal KKRP from positions 283 to 286. 

30 CLN8 was identified recently by positional cloning (Ranta et al., Nat Genet. 1999 Oct.;23(2):233- 

6). CLN8 encodes a 286 amino-acid putative transmembrane protein with no homology to previously 
known proteins. A naturally-occurring missense mutation in codon 24 (R24G at the border of the first 
putative transmembrane domain) is the molecular basis for EPMR ("progressive epilepsy with mental 
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retardation", MIM 600143). EPMR, also called Northern Epilepsy, is an autosomal recessive disorder 
characterized by normal early development, onset of generalized tonic-clonic seizures between the ages of 
5 and 10 years, and subsequent progressive mental retardation. Neuropathological findings have shown 
that EPMR is a new member of the neuronal ceroid lipofuscinosis (NCL) group of neurodegenerative 
5 disorders. The NCLs are a genetically heterogeneous group of progressive neurodegenerative disorders 
characterized by the accumulation of autofluorescent lipopigment in various tissues. CLN8 is the eighth 
gene to be linked to the NCL group of neuro-degenerative disorders. 

Subsequently, the homologous mouse gene (Cln8) was sequenced (82% nucleotide identity with 
the human gene) and localized to the region of the mouse genome linked to motor neuron degeneration, 

10 mouse mnd. Mnd is a naturally-occurring mouse mutant with intracellular autofluorescent inclusions 
similar to those seen in EPMR. A mutation in mnd mouse DNA was identified, indicating that mnd is a 
murine ortholog for CLN8 (Ranta et al., Nat Genet. 1999 Oct;23(2):233-6), and that mice containing 
mutations in Cln8 represent a murine model for NCL disorders. 

Recent experimental evidence has confirmed the transmembrane nature of the CLN8 protein 

15 (Lonka L et al., Hum Mol Genet. 2000 Jul 1;9(1 1):1691-7). CLN8 resides in the endoplasmic reticulum 
(ER) and recycles between the ER and the ER-Golgi intermediate compartment (ERGIC) via a KKXX 
ER-retrieval motif at its C-terminus (KKRP, amino-acids 283-286). This motif is recognized and bound 
by COPI, a vesicle-coating protein found in retrograde vesicles delivering cargo from the cis Golgi to the 
ER. The 30kD CLN8 protein is not processed during its maturation (in particular it is not N- 

20 glycosylated). The EPMR-associated R24G mutation does not alter cellular localization in humans. 

The subject invention provides a polypeptide encoded by SEQ ID NO: 287 and biologically 
active fragments of said polypeptide. Compositions comprising polypeptides and pharmaceutically 
acceptable carriers are likewise provided. Preferred polypeptides, and biologically active fragments 
thereof, have any of the biological activities or domains/motifs described herein and/or contain the amino 

25 acids of positions 155 and 225, 283 to 286. In one embodiment, the protein/polypeptide of SEQ ID NO: 
287 is encoded by clone 174-5-3-0-H7-CS. 

The ER/ERGIC cellular localization of protein of this invention can be used to target compounds 
to the ER/ERGIC. This targeting can be observed using any of the techniques known to those skilled in 
the art including those described in (Lonka L et al., Hum Mol Genet. 2000 Jul 1;9(11): 1691-7). In this 

30 aspect of the invention, the protein of SEQ ID NO: 287, or biologically active fragments thereof can be 
used to target liposomes, vesicles, or colloids to the ER/ERGIC compartment where active agents can be 
delivered. Methods of making and using targeted liposomes are well known in the art. 

In another embodiment, liposomes comprising the protein of SEQ ID NO: 287 can contain a 
second targeting agent for the specific selection of a target cell. The second targeting agent can be 
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selected for its ability to specifically target a cell or tissue. Thus, the second targeting agent can be 
specific for tumor markers, such as HER2. Alternatively, markers associated with specific cell types can 
be used (e.g., CD34, CD4, CD8, etc.). In a preferred embodiment, the second targeting agent is an 
antibody. Active agents include, but are not limited to, chemotherapeutic agents protein cross-linking 
5 agents, inhibitors of protein synthesis, anti-bacterial agents (e.g., antibiotics), antiviral agents, and/or anti- 
parasitic agents. The ability to bind the COPI coatomer can be assayed as described in (Cosson P, 
Letourneur F, Science. 1994 Mar 1 8;263(5 1 53): 1629-3 1). 

In another embodiment, the present invention provides methods of, and compositions for, 
identifying specific cellular compartments, such as the ER, ERGIC, and retrograde transport vesicles. 

10 This embodiment provides antibodies which specifically bind the protein of SEQ ID NO: 287, or 
biologically active fragments thereof, which are labeled with detectable markers, such as gold particles, 
enzymes, radioisotopes, or paramagnetic labels. ER, ERGIC, and retrograde transport vesicles can be 
identified in samples according to well-known immuno-diagnostic protocols. The antibodies, either 
monoclonal or polyclonal, can be made according to well-known methods. In a preferred embodiment, 

15 "the antibodies bind to ER retention signal. 

In another embodiment, the protein of the invention or part thereof can be used as a reagent for 
differential identification of the tissue(s) or cell type(s) present in a biological sample and for diagnosis of 
diseases and conditions, which include, but are not limited to, asthma, pulmonary edema, atherosclerosis, 
restenosis, stroke potential, thrombosis and hypertension. Similarly, the protein of the invention, or 

20 biologically active fragments thereof, and antibodies thereto can provide immunological probes for 
differential identification of the tissue(s) or cell type(s). In a number of disorders listed above, 
particularly of the pulmonary and cardiovascular systems, expression of this protein at significantly 
higher or lower levels can be routinely detected in certain tissues or cell types (e. g., vascular tissues, 
cancerous and wounded tissues) or bodily fluids (e. g., lymph, serum, plasma, urine, synovial fluid and 

25 spinal fluid) or another tissue or cell sample taken from an individual having such a disorder, relative to 
the standard gene expression level, i.e., the expression level in healthy tissue or bodily fluid from an 
individual not having the disorder. 

Indeed, the 80 first amino-acids of the protein of the invention are identical to two polypeptides 
claimed in Patent WO 99/35158, hereby incorporated by reference in its entirety (SEQ ID NO:98 and 

30 SEQ ID NO:162 corresponding to Geneseq accession numbers Y38413/Y38428 and Y38492) are over- 
expressed in pulmonary and endothelial tissues. 

The tissue distribution in pulmonary and endothelial tissues indicates that the protein product 
described in WO 99/35158 is useful for the treatment and diagnosis of cardiovascular and respiratory or 
pulmonary disorders such as asthma, pulmonary edema, pneumonia, atherosclerosis, restenosis, stroke, 
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angina, thrombosis hypertension, inflammation, and wound healing. Those conditions can be diagnosed 
by determining the amount of the protein of the invention in a sample. Thus, antibodies raised against the 
protein of SEQ ID NO: 287, or an immunogenic fragment of the protein can be used in diagnostic, 
prognostic, or screening assays such as those taught in WO 99/35158. 

5 Protein of SEP ID No. 270 (internal designation 1 16-1 19-3-0-H5-CS) 

The protein of SEQ ID NO: 270 encoded by the extended cDNA SEQ ID NO: 29 is homologous to 
the human mitochondrial ATP synthase f subunit or ATPK (E.C 3.6.1.34) (Swissprot accession number 
P56134) and is overexpressed in fetal kidney. 

The protein of SEQ ID NO: 270, composed of 88 amino acid residues, contains 1 transmembrane 

10 segment (position 1 to 55) predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. 
Notes, 10 :685-686 (1994). BLAST results show that 100% homology is found between amino acids 5 to 88 
of the protein of the invention and amino acids 10 to 93 of human ATP synthase f chain (93 amino acids 
total), exon 1 of the cDNA SEQ ID NO: 29 making the difference between the 2 proteins (the last 3 exons 
show 100% homology). Thus, the protein of the invention represents a new isoform of human mitochondrial 

15 ATP synthase f subunit. It is interesting to note that the same splice variant is found in bovin, pig and mouse 
species. 

The mitochondrial electron transport (or respiratory) chain is a series of enzyme complexes in the 
mitochondrial membrane that is responsible for the transport of electrons from NADH to oxygen and the 
coupling of this oxidation to the synthesis of ATP (oxidative phosphorylation). ATP then provides the 

20 primary source of energy for driving a cell's many energy-requiring reactions. ATP synthase (F0 Fl 

ATPase) is the enzyme complex at the terminus of this chain and serves as a reversible coupling device that 
interconverts the energies of an electrochemical proton gradient across the mitochondrial membrane into 
either the synthesis or hydrolysis of ATP. This gradient is produced by other enzymes of the respiratory chain 
in the course of electron transport from NADH to oxygen. When the cell's energy demands are high, electron 

25 transport from NADH to oxygen generates an electrochemical gradient across the mitochondrial membrane. 
Proton translocation from the outer to the inner side of the membrane drives the synthesis of ATP. Under 
conditions of low energy requirements and when there is an excess of ATP present, this electrochemical 
gradient is reversed and ATP synthase hydrolyzes ATP. The energy of hydrolysis is used to pump protons 
out of the mitochondrial matrix. ATP synthase is, therefore, a dual complex, the F0 portion of which is a 

30 transmembrane proton carrier or pump, and the Fl portion of which is catalytic and synthesizes or hydrolyzes 
ATP. Mammalian ATP synthase complex consists of sixteen different polypeptides (Walker, J. E. and 
Collinson, T. R. (1994) FEBS Lett.346: 39-43). Six of these polypeptides (subunits alpha, beta, gamma, delta, 
epsilon, and an ATPase inhibitor protein IF 1) comprise the globular catalytic F 1 ATPase portion of the 
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complex, which lies outside of the mitochondrial membrane. The remaining ten polypeptides (subunits a, b, 
c, d, e, f, g, F6, OSCP, and A6L) comprise the proton-translocating, membrane spanning FO portion of the 
complex. Like other members of the respiratory chain, all but two of the polypeptide subunits of ATP 
synthase are nuclear gene products that are imported into the mitochondria. Enzyme complexes similar to 
5 mammalian ATP synthase are found in all cell types and in chloroplast and bacterial membranes. This 
universality indicates the central importance of this enzyme to ATP metabolism. Transcriptional regulation of 
these nuclear encoded genes appears to be the predominant means for controlling the biogenesis of ATP 
synthase. Multiple mitochondrial pathologies exist because of the essential role of mitochondrial oxidative 
phosphorylation in cellular energy production, in the generation of reactive oxygen species and in the 

10 initation of apoptosis (Wallace, Science, 283:1482-1488, 1999). It is now clear that mitochondrial diseases 
encompass an assemblage of clinical problems commonly involving tissues that have high energy 
requirements such as heart, muscle and the renal and endocrine systems. Over the past 11 years, a 
considerable body of evidence has accumulated implicating defects in the mitochondrial energy-generating 
pathway, oxidative phosphorylation, in a wide variety of degenerative diseases including myopathy and 

15 cardiomyopathy. Most classes of pathogenic mitochondrial DNA mutations affect the heart, in association 
with a variety of other clinical manifestations that can include skeletal muscle, the central nervous system 
(including eye), the endocrine system, and the renal system. Nuclear mutations causing mitochondrial 
disorders have been described. They are often found in highly conserved subunits. Mitochondrial disorders 
with nuclear mutations include : myopathies (PEO, MNGIE, congenital muscular dystrophy, carnitine 

20 disorders), encephalopathies (Leigh, Infantile, Wilson's disease, Deafhess-Dystonia syndrome), other 
systemic disorders and cardiomyopathies. 

The discovery of a new ATP synthase subunit, and polynucleotides encoding it satisfy a need in the 
art by providing new compositions which are useful for the diagnosis, prevention, and treatment of cancer, 
myopathies, immune disorders, and neurological disorders. 

25 It is believed that the protein of SEQ ID NO: 270 or part thereof plays a role in cellular 

respiration, preferably as a mitochondrial ATP synthase subunit. Preferred polypeptides of the invention 
are fragments of SEQ ID NO: 270 having any of the biological activity described herein. 

An object of the present invention are compositions and methods of targeting heterologous 
compounds, either polypeptides or polynucleotides to mitochondria by recombinantly or chemically 

30 fusing a fragment of the protein of the invention to an heterologous polypeptide or polynucleotide. 
Preferred fragments are signal peptide, amphiphilic alpha helices and/or any other fragments of the 
protein of the invention, or part thereof, that may contain targeting signals for mitochondria including but 
not limited to matrix targeting signals as defined in Herrman and Neupert, Curr. Opinion Microbiol. 
3:210-4 (2000); Bhagwat et al. J. Biol. Chem. 274:24014-22 (1999), Murphy Trends Biotechnol. 15:326- 
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30 (1997); Glaser et al. Plant Mol Biol 38:311-38 (1998); Ciminale et al. Oncogene 18:4505-14 (1999). 
Such heterologous compounds may be used to modulate mitochondria's activities. For example, they 
may be used to induce and/or prevent mitochondrial-induced apoptosis or necrosis. In addition, 
heterologous polynucleotides may be used for mitochondrial gene therapy to replace a defective 
5 mitochondrial gene and/or to inhibit the deleterious expression of a mitochondrial gene. 

The invention further relates to methods and compositions using the protein of the invention or 
part thereof to diagnose, prevent and/or treat several disorders in which mitochondrial respiratory electron 
transport chain is impaired, including but not limited to mitochondriocytopathies, necrosis, aging, 
myopathies, cancer and neurodegenerative diseases such as Alzheimer's disease, Huntington's disease, 

10 Parkinson's disease, epilepsy, Down's syndrome, dementia, multiple sclerosis, and amyotrophic lateral 
sclerosis. For diagnostic purposes, the expression of the protein of the invention could be investigated 
using any of the Northern blotting, RT-PCR or immunoblotting methods described herein and compared 
to the expression in control individuals. For prevention and/or treatment purposes, the protein of the 
invention may be used to enhance electron transport and increase energy delivery using any of the gene 

15 therapy methods described herein or known to those skilled in the art. 

In another embodiment, The invention further relates to methods and compositions using the 
protein of the invention or part thereof to diagnose, prevent and/or treat several disorders in which 
mitochondrial respiratory electron transport chain needs to be impaired, including but not limited to Sjogren's 
syndrome, Addison's disease, bronchitis, dermatomyositis, polymyositis, glomerulonephritis, diabetes 

20 mellitus, emphysema, Graves' disease, atrophic gastritis, lupus erythematosus, myasthenia gravis, multiple 
sclerosis, autoimmune thyroiditis, ulcerative colitis, anemia, pancreatitis, scleroderma, rheumatoid and 
osteoarthritis, asthma, allergic rhinitis, atopic dermatitis, dermatomyositis, polymyositis, and gout, using any 
techniques known to those skilled in the art including the antisense or triple helices strategies described 
herein. 

25 Moreover, antibodies to the protein of the invention or part thereof may be used for detection of 

mitochondria organelles and/or mitochondrial membranes using any techniques known to those skilled in 
the art. 

Protein of SEP ID NO: 271 (internal designation 1 17-001 -5 -0-G3-CS) 

The protein of SEQ ID NO: 271 is homologous to the family of lipopolysaccharide (LPS) 
30 binding proteins (LBPs). Several families of proteins have the ability to bind LPS including (a) the 
lipopolysaccharide-binding proteins (LBPs), and (b) the bactericidal permeability-increasing proteins 
(BPIs). Cholesteryl ester transfer protein (CETP), which is involved in the transfer of insoluble 
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cholesteryl esters in reverse cholesterol transport, shares some homology to members of the LPS binding 
family of proteins. 

Lipopolysaccharide (LPS), alternatively known as bacterial endotoxin, is a major component of the 
outer membrane of Gram-negative bacteria. It consists of serotype-specific O-side chain polysaccharides 
5 linked to a core oligosaccharide and Lipid A. LPS is a potent mediator of the inflammatory response and 
stimulates the expression of many pro-inflammatory and pro-coagulant compounds in monocytes, 
macrophages, and endothelial cells. While these responses are important in containing and eliminating 
localized infections, systemic exposure to LPS can lead to a number of adverse effects. These include: (a) 
induction of an inflammatory cascade, (b) damage to the endothelium, (c) widespread coagulopathies, and (d) 
10 organ damage. 

Systemic exposure to LPS can arise from direct infection by Gram-negative bacteria, leading to the 
complications of Gram-negative sepsis. Examples of diseases which are associated with Gram-negative 
bacterial infection or endotoxemia (including bacterial meningitis, neonatal sepsis, cystic fibrosis, 
inflammatory bowel disease, and liver cirrhosis), Gram-negative pneumonia, Gram-negative abdominal 

15 abscess, hemorrhagic shock, and disseminated intravascular coagulation. Subjects who are leukopenic or 
neutropenic, including subjects treated with chemotherapy or immunocompromised subjects, are particularly 
susceptible to bacterial infection and the subsequent effects of endotoxin exposure. 

Gram-negative sepsis remains one of the primary causes of severe systemic inflammation in 
hospitalized and immunocompromised patients. Alternatively, changes in gut permeability by a variety of 

20 circumstances, including trauma, can lead to translocation of bacteria/LPS into the bloodstream. Bacteria 
translocated from the gut is thought to play a major role in post-surgical immunosuppression (Little et al., 
Surgery. 114: 87-91 (1993)) and hemorrhagic shock. Therefore, there is a great interest to characterize 
proteins involved in the biological response to LPS and to discover therapies that can counteract the effects of 
LPS in pathological situations. 

25 LBP is a 60 kDa glycoprotein synthesized in the liver and present in normal human serum. LBP 

expression is upregulated in response to infectious, inflammatory, and toxic mediators. LBP expression has 
been induced in animals challenged with LPS, silver nitrate, turpentine, and Corynebacterium parvum (Geller 
et al, Surgery 128:22-28 (1993); Gallay et al., Infect. Immun. 61:378-383 (1993); Tobias et al, J. Exp. Med. 
164:77-793 (1986)). LBP levels are correlated with exposure to LPS, and elevated levels (particularly 

30 persistent elevated levels) have been correlated with poor clinical outcomes in septic patients (U.S. Patent 
Nos. 5,484,705, and 5,804,367, hereby incorporated by reference in their entirety). 

A portion of the LBP molecule (the N-terminal 1-197 aa) binds to the lipid A portion of the LPS 
molecule to form a high affinity LBP/LPS complex (Tobias, et al, J. Biol. Chem 264: 10867-10871 
(1989)). The LBP/LPS complex potentiates the cellular response to LPS via an interaction with the 
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monocytic differentiation antigen CD14 (Wright et al. Science. 249: 1431-1433 (1990); Lee et al, J. Exp. 
Med. 175:1697-1705 (1992)). LPS can be transferred from LBP to membrane-bound or soluble CD14. 
Activated CD14 can then interact with endothelial cells to elicit an inflammatory response. The C- 
terminal portion of LBP is required to transfer LPS to CD14 (U.S. Pat. No. 5,731,415; Theofan et al., J. 
5 Immunol. 152:3624-29 (1994); Han et al, J. Biol. Chem. 269:8172-75 (1994)). Evidence also suggests 
that LBP can neutralize LPS by an interaction with serum lipoproteins or through the internalization of an 
LBP/LPS/CD14 complex by neutrophils (Wurfel et al, J. Exp. Med. 180:1025-1035 (1994); Wurfel et al, 
J. Exp. Med. 181:1743-54 (1995); Gegner et al, J. Biol. Chem. 20:5320-5325 (1995)). 

The subject invention provides the polypeptide of SEQ ID NO: 271 and polynucleotide sequences 

10 encoding the amino acid sequence of SEQ ID NO: 271. In a one embodiment, the polypeptides of SEQ 
ID NO: 271 are interchanged with the polypeptides encoded by the human cDNA of clone 181-20-3-0- 
B5-CS. Also included in the invention are biologically active fragments of the protein of SEQ ID NO: 
271 and polynucleotide sequences encoding these biologically active fragments. In a preferred 
embodiment, biologically active fragments of SEQ ID NO: 271 are encoded by clone 1 81-20-3 -0-B5-CS 

15 and comprise the first 181 amino acids encoded by clone 181 -20-3 -0-B5-CS. "Biologically active 
fragments" are defined as those peptide or polypeptide fragments of SEQ ID NO: 271 which have at least 
one of the biological functions of the full length protein (e.g., the ability to bind bacterial LPS). 

The invention also provides variants of SEQ ID NO: 271. These variants have at least about 
80%, more preferably at least about 90%, and most preferably at least about 95% amino acid sequence 

20 identity to the amino acid sequence of SEQ ID NO: 271 . Variants according to the subject invention also 
have at least one functional or structural characteristic of SEQ ID NO: 271, such as the biological 
functions described above. The invention also provides biologically active fragments of the variant 
proteins. Unless otherwise indicated, the methods disclosed herein can be practiced utilizing the 
polypeptide of SEQ ID NO: 271 or variants thereof. Likewise, the methods of the subject invention can 

25 be practiced using biological fragments of the protein of SEQ ID NO: or variants of said biologically 
active fragments. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 
SEQ ID NO: 271. It is well within the skill of a person trained in the art to create these alternative DNA 
sequences which encode proteins having the same, or essentially the same, amino acid sequence. These 
30 variant DNA sequences are, thus, within the scope of the subject invention. As used herein, reference to 
"essentially the same sequence" refers to sequences that have amino acid substitutions, deletions, 
additions, or insertions that do not materially affect biological activity. Fragments retaining one or more 
characteristic biological activity of SEQ ID NO: are also included in this definition. 
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"Recombinant nucleotide variants" are alternate polynucleotides which encode a particular 
protein. They can be synthesized, for example, by making use of the "redundancy" in the genetic code. 
Various codon substitutions, such as the silent changes which produce specific restriction sites or codon 
usage-specific mutations, can be introduced to optimize cloning into a plasmid or viral vector or 
5 expression in a particular prokaryotic or eukaryotic host system, respectively. 

The protein of SEQ ID NO: 271, and variants thereof, can be used to produce antibodies 
according to methods well known in the art. The antibodies can be monoclonal or polyclonal. Antibodies 
can also be synthesized against fragments of SEQ ID NO: 271, as well as variants thereof, according to 
known methods. The subject invention also provides antibodies which specifically bind to biologically 

10 active fragments of SEQ ID NO: 271 or biologically active fragments of SEQ ID NO: 271 variants. 

The subject invention also provides for immunoassays which are used to screen for, monitor, or 
diagnose exposure to LPS. In one embodiment, diagnostic assays measure the level of LBP in patient 
plasma samples. LBP levels are known to rise in response to exposure to LPS, thus the measurement of 
the level of the protein of SEQ ID NO: 271 can provide an early indication of Gram-negative infection or 

1 5 of endotoxin exposure. 

The subject invention provides methods of treating individuals infected with Gram negative 
bacteria comprising the administration of therapeutically-effective compositions comprising SEQ ID NO: 
271. In one embodiment, the protein lacks the C-terminal portion (or portions of the C-terminal domain) 
necessary to transfer LPS to CD14. LPS can be scavenged by the excess N-terminal fragment and would 

20 be unable to induce an inflammatory response (see, e.g., U.S. Patent No. 5,731,415, hereby incorporated 
by reference in its entirety). 

Another aspect of the subject invention provides methods of prophylaxis. The method treats 
individuals by administration of therapeutically-effective amounts of compositions comprising SEQ ID 
NO: 271. Instances where this aspect of the invention can be performed include, but are not limited to, 

25 conditions associated with increased translocation of gut bacteria and endotoxin, particularly prior to 
surgery. In addition, patients who are at risk for potential Gram-infection, including but not limited to 
patients undergoing chemotherapy, or patients who are immunocompromised (for example with AIDS) 
can benefit from such treatment. Such uses are described in U.S. Patent No. 5,990,082, hereby 
incorporated by reference in its entirety. 

30 The N-terminal portion of LBP, which lacks the ability to induce an inflammatory response, can 

be fused to other proteins or fragments thereof (such as the bactericidal/permeability-increasing protein or 
BPI) which can increase the association of these molecules with LPS and aid in the clearance of 
endotoxin from patients who have been exposed to Gram negative bacteria. Such preparations can be 
used to treat and inhibit a number of Gram-negative infections, Gram positive, or fungal infections, as 
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described in the following patents: WO 95/19179 A, WO 95/19180 A, WO 95/19372 A, and WO 
96/34873 A, each of which is incorporated by reference in its entirety. 

The subject invention also provides methods of removing endotoxin from recombinantly- 
produced proteins. In one embodiment, the recombinantly-produced proteins are obtained from Gram- 
5 negative bacteria. In a preferred embodiment, the bacteria are E. coli. In another embodiment, the 
protein of SEQ ID NO: 271, biologically active fragments thereof, variants, or derivatives thereof, are 
contacted with compositions comprising recombinantly-produced proteins. The contacting step can take 
place with SEQ ID NO: 271 immobilized on a substrate or with SEQ ID NO: 271 present in free solution. 
In addition, protein of SEQ ID NO: 271, biologically active fragments, or derivatives thereof, can 

10 be used in diagnostic assays to measure the level of LPS in patient plasma samples. In such an assay, 
serum samples would be bound to a solid matrix, such as a membrane, plastic, treated plastic, or other 
supports, and then cloned with the protein of SEQ ID NO: 271. Visualization can be achieved by fusing 
protein of SEQ ID NO: to any number of enzymes followed by treatment with a chromogenic, 
fluorogenic, or luminescent substrate. Alternatively, the protein of SEQ ID NO: 271, biologically active 

15 fragments, variants, or derivatives thereof, can be linked to a fluorescent or luminescent protein or 
compound. The linkage can be chemical or made by recombinant techniques known to those skilled in 
the art. In addition, antibodies raised against the protein of SEQ ID NO: 271, biologically active 
fragments, variants, or derivatives thereof can be used to visualize the LPS/protein 271 complexes using 
immunoassays known to those skilled in the art. 

20 Protein of SEQ ID NO:266 (internal designation 1 16-1 10-2-0-F4-CS) 

The protein of SEQ ID NO:266, highly expressed in the testis, is encoded by cDNA of SEQ ID 
NO:25 and exhibits homology to the Ly-6 family of GPI-linked cell-surface glycoproteins composed of 
one or more copies of a conserved domain of about 100 amino-acid residues (PS00983; LY6_UPAR ). 

The protein of SEQ ID NO: 266 shows significant structural similarities to mouse Ly-6 antigens, 

25 human CD59 and a herpes virus CD59 homolog. The protein of SEQ ID NO:266 displays one copy of 
the motif of the u-PAR/Ly-6 domain, with all ten extracellular cysteine residues conserved. The mature 
protein sequence contains a relatively high proportion of cysteine residues (10/105), which suggests that 
numerous disulfide bonds stabilize its tertiary structure. Furthermore, the 124 amino-acid long protein of 
SEQ ID NO:266 has a size very similar to that of many members of the Ly-6 family. In addition, the 

30 protein of the invention has a predicted signal peptide structure (positions from 1 to 19) and a C-terminal 
hydrophobic fragment (positions from 101 to 121) necessary for GPI-anchoring in a membrane. Thus, 
the protein of the invention has a clear evolutionary relationship with the Ly-6/uPAR family, particularly 
with Ly-6 subfamily. 
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The Ly-6/uPAR protein family members share one or several repeat units of the Ly-6/uPAR 
domain, which is defined by a distinct disulfide bonding pattern between 8 or 10 cysteine residues. This 
family can be divided into two subfamilies. One comprises GPI-anchored glycoprotein receptors with 10 
cysteine residues. Another subfamily includes the secreted single-domain snake and frog cytotoxins, and 
5 differs significantly in that its members generally possess only eight cysteines and no GPI-anchoring 
signal sequence (Andermann K 5 et al. Protein Sci 8(4):810-819 (1999)). The Ly-6 family members are 
low molecular weight phosphatidyl inositol anchored glycoproteins with remarkable amino acid 
homology throughout a distinctive cysteine rich protein domain that is associated predominantly with O- 
linked carbohydrate. Their GPI links are necessary to anchor these cell surface proteins to the outside of 

10 the lipid bilayer membrane. The Ly-6 family includes human CD59, which protects from complement- 
mediated membrane damage, squid Sgpl and Sgp2, urokinase plasminogen activator receptor, murine 
Sca-1 and Sca-2, and many other proteins. The general structure seen within the Ly-6 family resembles 
that of the receptor for a urokinase-type plasminogen activator and the alpha- neurotoxins from snake 
venoms (Fleming T J et al J Immunol 150:5379-5390 (1993); Ploug M and V Ellis FEBS Lett 349:163- 

15 168(1994)). 

The Ly-6 cell surface proteins are differentially expressed in several hematopoietic lineages that 
appear to function in signal transduction and cell activation predominantly on lymphoid cells in the 
mouse. Analyses using anti-Ly-6A/E monoclonal antibodies has also demonstrated in situ expression of 
Ly-6 molecules in brain tissue (staining primary associated with vascular elements throughout the brain). 

20 These proteins do not appear to be expressed during embryonic or neonatal stages of development (Cray 
C et al. Brain Res Mol Brain Res 8(1):9-15 (1990)). 

Ly-6 protein expression has been shown to be factor-dependent. For example, the expression of 
the Ly-6A/E, which normally occurs in hemopoietic stem cells, fibroblasts, and T and B lymphocytes, has 
been shown to be greatly induced by IFN-B in various tissues and cell lines. In addition, the Ly-6E Ag is 

25 associated with tyrosine kinases in T cells, and reduced expression of Ly-6E in T cells impairs normal 
functional responses, as well as tyrosine kinase activity, in these cells. Further, the IFNs are important in 
the generation of memory CD8+ T cells, and it has been demonstrated that the expression of Ly-6C Ag is 
a strong marker for the memory phenotype (Mehran M. et al. Journal of Immunology 163: 811-819 
(1999)). Like their murine counterparts, a human homologue of Ly-6 genes, the 9804 gene, is responsive 

30 to IFNs. The 9804 gene is also inducible by retinoic acid during differentiation of acute promyelocyte 
leukemia cells. Further, cultured glial and neuronal cells express high levels of Ly-6A/E following 
incubation with cytokines, including rIFN-gamma. (Cray C et al. Brain Res Mol Brain Res 8(1):9-15 
(1990)). Another member of the Ly-6 family, human protein RoBo-1, shows increased expression in 
response to two modulators of bone metabolism, estradiol and intermittent mechanical loading, 
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suggesting a role in bone homeostasis (Noel LS et al. J Biol Chem, Vol. 273(7): 3878-3883 (1998)). Such 
factor-dependence of expression makes Ly-6 proteins either candidates or targets for alloresponses and 
autoimmune disease. For example, the high level factor-induced expression of LY-6s has been associated 
with lupus nephritis (Blake P G et al. J Am Soc Nephrol 4:1 140-1 150 (1993)). 
5 Murine Ly-6 molecules have interesting patterns of tissue expression during haematopoiesis, from 

multipotential stem cells to lineage committed precursor cells, and on specific leukocyte subpopulations 
in the peripheral lymphoid tissues. These patterns suggest an intimate association between the regulation 
of Ly-6 expression, and the development and homeostasis of the immune system (Gumley TP et al. 
Immunol Cell Biol 73(4):277-296 (1995)). Ly-6M messenger RNA (mRNA) is easily detectable in 

10 hematopoietic tissue (bone marrow, spleen, thymus, peritoneal macrophages) as well as kidney and lung 
(Patterson JM et al. Blood 95(10):3 125-3 132 (2000)). 

Normally, human blood cells are protected against autologous complement activation by 
membrane proteins that block the assembly of functional complement pores. One such protein is human 
Ly-6 CD59. Administration of CD59 prevents hemolytic disease or thrombosis. Further, the CD59 

15 protein may prevent the complement-mediated lysis and activation of endothelial cells that leads to hyper 
acute rejection, and therefore may be administered during xenogeneic organ transplantation (Binette, J. P. 
and Binette, M. B., Scanning Microcs., 7:1 107-10 (1993)). 

The surface receptor for urokinase plasminogen activator (uPAR) has been recognized in recent 
years as a key molecule in regulating plasminogen mediated extracellular proteolysis. Surface 

20 plasminogen activation controls the connections between cells, basement membrane and extracellular 
matrix, and therefore the capacity of cells to migrate and invade neighboring tissues (Roldan AL et al. 
EMBO J 9(2):467-474 (1990)). Certain factors of the PA system, such as u-PAR, have been detected in 
organs of the male reproductive tract in various species. The morphological study provide support for the 
involvement of the PA system in human male reproductive physiology (Gunnarsson M et al. Mol Hum 

25 Reprod5(10):934-940(1999)). 

LY-6 proteins have been suggested to play important roles in disorders such as cancers, 
nephopathies, autoimmune diseases, hemolytic disease, thrombosis, Alzheimer's disease, etc. Several 
members of the murine Ly-6 supergene family are clearly involved in the progression of certain mouse 
tumors, as their expression level is higher in highly malignant cells than in tumor cells with a lower 

30 malignancy phenotype. Sorting by flow cytometry of tumor cells to subpopulations expressing either 
high or low levels of Ly-6E.l yielded cells expressing a high or a low malignancy phenotype, 
respectively. Further, it was shown that LY-6 is highly expressed on non-lymphoid tumor cells 
originating from a variety of tissues in mice. Upregulation or high expression is correlated with a more 
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malignant phenotype which results in higher efficiency of local tumor production (Katz et al Int J Cancer 
59:684-91 (1994)). 

Cells derived from angiogenic tumors express a higher tumorigenicity phenotype and a higher 
capacity to produce artificial pulmonary metastases than cells from the poorly angiogenic tumors. These 
5 cells also express significantly higher levels of the lymphocyte activation protein Ly-6E, so the 
angiogenic phenotype appears to be coregulated with Ly-6 (Sagi-Assif O et al. Immunol Lett 54(2- 
3):207-13 (1996)). Some LY-6 proteins also block secretion of interleukin II (IL-2) which is an approved 
anticancer agent and a key regulatory hormone in cell-mediated immunity (Fleming T J and T R Malek J 
Immunol 153:1955-62 (1994)). IL-2 stimulates the proliferation of both T and natural killer cells and 

10 activates NK cells which can directly lyse freshly isolated, solid tumor cells. 

The high malignancy, high Ly-6E.l -expressing cells also expressed high levels of the receptor for 
urokinase plasminogen activator (uPAR), whereas low malignancy, low Ly-6E.l -expressing cells also 
expressed low levels of uPAR. Transfection studies have indicated that uPAR is causally involved in 
conferring a high malignancy phenotype upon tumor cells expressing high levels of Ly-6E.l. E48, a 

1 5 human homologue of the murine ThB Ly-6 protein, is expressed on head and neck squamous carcinoma 
cells. In E48-stimulated cells, the binding of E48 to its microenvironmental ligand appears to transduce a 
signal that up-regulates the expression of the FX enzyme in these cells, leading to an increase in the levels 
of GDP-L-fucose (Rinat Eshel et al. J Biol Chem, Vol. 275(17): 12833-12840 (2000)). A congenital 
disorder of leukocyte adhesion to vascular endothelium termed LADII is reflected in a generalized fucose 

20 deficiency and major defects in leukocyte trafficking and function. Ly-6 loss-variants of a murine tumor 
exhibit alterations in the incorporation of fucose and mannose into cellular glycoconjugates (Witz IP J. 
Cell. Biochem. Suppl. 34:61-66 (2000)). 

It is believed that the protein of SEQ ID NO:266 is a novel member of the Ly-6 protein family, 
and is thus a specific cell-surface glycoprotein antigen involved in signal transduction and cell activation, 

25 proliferation and differentiation. Preferred polypeptides of the invention are polypeptides comprising the 
amino acids of SEQ ID NO:266 from position 1 to position 18 and from position 19 to position 124. 
Other preferred polypeptides of the invention are any fragments of SEQ ID NO:266 having any of the 
biological activities described herein. 

In one embodiment, this invention relates to methods and compositions using the protein of the 

30 invention or part thereof as a marker protein to selectively identify tissues, preferably testis. For example, 
the protein of the invention or part may be used to synthesize specific antibodies using any technique 
known to those skilled in the art. Such tissue-specific antibodies may then be used to identify tissues of 
unknown origin, such as forensic samples, differentiated tumor tissue that has metastasized to foreign 
bodily sites, etc., or to differentiate different tissue types in a tissue cross-section using immunochemistry. 
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Another embodiment of the present invention relates to methods of using of the protein of the 
invention or part thereof and related compounds and derivatives to diagnose developmental and malignant 
disorders in tissues including urogenital tissues and other tissues of the reproduction system of both sexes. 
For example, a biological sample is obtained from a patient with cancer or at risk of developing cancer, 
5 and the level of SEQ ID NO:25 polynucleotides or encoded polypeptides is detected within the cells of 
the sample. The detection of an elevated level of the SEQ ID NO:25 polynucleotides or encoded 
polypeptides in the sample relative to a control level indicates the presence of malignant cells within the 
patient. The expression of the protein of the invention can be investigated using any of a number of 
methods, including, but not limited to, Northern blotting, RT-PCR or immunoblotting. 

10 Another embodiment of the invention relates to compositions and methods using the protein of 

the invention or part thereof in recombinant protein form as pharmacological agents in the treatment of 
developmental and malignant disorders in tissues including urogenital tissues and in other tissues of 
human reproduction system. Particulary, the protein of the invention or part thereof can be used in the 
treatment of disorders which are manifested by male sterility. 

15 In another embodiment of the invention, antibodies which bind to the protein of the invention or 

part thereof are used in the treatment of tumors, e.g., human urogenital tumors, especially to enhance the 
secretion of interleukin II, which is an approved anticancer agent and key regulatory hormone in cell- 
mediated immunity. Such antibodies can be used alone or bound to a substance capable of ablating or 
killing cells as a therapy for urogenital disorders or cancers in which the protein of the invention is 

20 overexpressed. 

The protein of the invention or part thereof may also be used in the treatment of diseases which 
can require transplantation, including various forms of cancers such as genitourinary cancers, carcinomas, 
sarcomas, atherosclerosis, angiogenesis, and benign tumors. As mentioned above, Ly-6 family includes 
several proteins which are similar to the protein of the invention and which are capable of protecting cells 

25 from complement-mediated membrane damage. Therefore, in another embodiment of the invention, 
recombinant proteins encoded by SEQ ID NO:25 or fragments thereof are administered during 
xenogeneic tissue transplantation to prevent complement-mediated lysis and to block activation of 
endothelial cells, which normally leads to hyper-acute rejection. 

In addition, prevention of complement-mediated lysis may be particulary important in human and 

30 animal reproductive therapy, where functional survival of the germ cells during in vitro handling is 
crucial. Storage of sperm is of widespread importance in commercial animal breeding programs, human 
sperm donor programs, and in the treatment of certain disease states. For example, sperm samples may be 
frozen for men who have been diagnosed with cancer or other diseases that may eventually interfere with 
sperm production, as well as for assisted reproduction purposes where sperm may be stored for use at 
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other locations or times. The procedures utilized in such cases include: washing a sperm sample to 
separate out the sperm-rich fraction from non-sperm components of a sample such as seminal plasma or 
debris; further isolating the healthy, motile sperm from dead sperm or from white blood cells in an 
ejaculate; freezing or refrigerating of sperm for use at a later date or for shipping to females at differing 
5 locations; extending or diluting sperm for culture in diagnostic testing or for use in therapeutic 
interventions such as in vitro fertilization or intracytoplasmic sperm injection (Cohen et al. 12 : 994-1001 
(1997)). Once sperm have been washed or isolated, they are then extended (or diluted) in culture or 
holding media for a variety of uses (sperm analysis, diagnostic tests, assisted reproduction). Each of these 
uses for extended or diluted sperm requires a somewhat different formulation of basal medium (see, for 

10 review, US Patent No. 6,140,121 Ellington et al. Oct. 2000); however, in all cases sperm survival is 
suboptimal outside of the female reproductive tract. Novel additional components of a dilution or storage 
medium which could improve the functional preservation of sperm would be useful. Therefore, in 
another preferred embodiment of this invention, purified recombinant proteins encoded by SEQ ID 
NO:25 or fragments thereof can be added as components of pharmacological media designed to protect 

15 spermatozoa. The methods used to compose such preservation media are generally known by those 
skilled in the art (for ex., Oliver S.A . et al. US patent 5,897,987 Apr. 1999; Cohen J. et al., supra). 
Inversely, in yet another embodiment of this invention, ligands, inhibitors, neutralizing antibodies or other 
biological agents which recognize the protein of the invention and which bind it and which block it can 
be used as components of pharmacological formulations designed for male contraception purposes. 

20 In still another embodiment of this invention, chimeric ligands or derivatives which 

recognize the protein of the invention or part thereof and which could be internalized into cell can be used 
to design a system of drug delivery finely targeted toward urogenital and other tissues which express the 
protein of SEQ ID NO:266. For example, such recognizing molecules can be incorporated into the 
membranes of liposomes to allow the specific delivery of the liposomes to cells expressing the protein of 

25 SEQ ID NO:266. Methods of designing such drug delivery systems are known by those skilled in the art 
(Smith H.J. Introduction to the principles of drug design and action, 3 rd ed. (1998)). 

Proteins SEQ ID NOs:417, 413, 418 (internal designations 188-45-1-0-D3-CS, 188-26-4-0-F5-CS, and 
188-5-1-0-H6-CS) 

The proteins of SEQ ID NOs:417, 413, and 418, encoded by the cDNAs of SEQ ID NOs: 176, 
30 172, and 177, are expressed in the brain and exhibit strong homology with proteins with redox activity 
(see, e.g. Genbank accession numbers AK001293 and AF029689, and Geneseqp accession number: 
Y59180). 
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The protein of SEQ ID No:418 (320 amino acids) is a variant of AK001293 (322 amino acids). 
AK001293 has six extra nucleotides, within the same ORF, as SEQ ID No:418, producing a longer 
protein. SEQ ID NO:418 exhibits the Pfam Zinc-binding dehydrogenase (adh zinc) signature from 
positions 16 to 313. SEQ ID NO:418 presents all the conserved residues of the motif except for a 
5 histidine that is thought to be a zinc-ligand. This lack of zinc-ligand residues is a feature of the quinone 
oxidoreductases (QOR), a subfamily of zinc-binding dehydrogenases. 

SEQ ID NO:413 (191 amino acids) shares the first 172 amino acids with SEQ ID NO:418. The 
deletion of one nucleotide at position 583 in the SEQ ID NO:413 cDNA sequence (corresponding to 
amino acid 173), however, creates a change of ORF compared to SEQ ID NO:418 and AK001293. 
10 SEQ ID NO:417 is a short protein (20 amino acids) whose sequence corresponds to the N- 

terminal end of the other proteins of the invention. The presence of a T (instead of a G in public 
sequences and SEQ ED NOs:413 and 418) at position 128 on the cDNA creates a STOP codon, creating a 
shorter protein. 

SEQ ID NOs:417, 413 and 418 are similar to the QORs, a family of zinc-binding 

15 dehydrogenases. QORs are cytoplasmic redox-regulated flavoenzymes that catalyze the one or two- 
electron reduction of quinones. QORs bind NADP and are inhibited by dicoumarol. 

The activity of QORs protects cells against toxicity, mutagenicity, and cancer due to exposure to 
environmental and synthetic quinones and their precursors. Thus, QORs play a central role in monitoring 
cellular redox state and act to protect against oxidative stress induced by a variety of metabolic situations 

20 (Raina A.K. et al. (1999) Redox Rep. 4:23-7). The oxidoreductase activity also permits the activation of 
bioreductive anticancer drugs (Begleiter A. et al. (1996) Br. J. Cancer Suppl. 27:S9-14). 

The metabolism of quinones involves enzymatic reduction of the quinone by one or two 
electrons. In the activation of quinone-containing antitumor agents, this reduction results in the formation 
of the semiquinone or the hydroquinone of the anticancer drug. The consequence of these enzymatic 

25 reductions is that the semiquinone yields its extra electron to oxygen with the formation of superoxide 
radical anion and the original quinone. This reduction by a reductase followed by oxidation by molecular 
oxygen (dioxygen) is known as redox-cycling and continues until the system becomes anaerobic. In the 
case of a two-electron reduction, the hydroquinone could become stable, and as such, be excreted by the 
organism in a detoxification pathway. 

30 The cellular antioxidant response is mediated by a battery of detoxifying/defensive proteins. The 

promoters of genes that encode these proteins contain a common cis-element termed the antioxidant 
response element (ARE). Many transcription factors, including Nrf, Jun, Fos, Fra, Maf, YABP, ARE- 
BP 1, Ah (aromatic hydrocarbon) receptor, and estrogen receptor bind to the ARE from various genes. 
Among these factors, Nrf-Jun heterodimers positively regulate ARE-mediated expression and induction 
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of genes in response to antioxidants and xenobiotics (reviewed in Dhakshinamoorthy S. et al. (2000) 
Curr. Top Cell Regul. 36:201-16). On the other hand, c-Fos represses ARE-mediated gene expression 
(Venugopal, R., and Jaiswal, A.K. (1996) Proc. Natl. Acad. Sci. USA 93, 14960-5). 

Elevated levels of QOR activity have been reported in several kinds of tumors such as liver, 
5 colon, lung and breast (Belinsky M., Jaiswal A.K., (1993) Cancer Metastasis Rev 12:103-17). 
Bioreactive antitumor agents are an important class of anticancer drugs that require activation by 
reduction. For this reason, QORs are a potential target on which to base the development of new 
antitumor compounds. Certain QORs have already been implicated in the metabolism, activation and 
mechanism of cytotoxicity of some anticancer drugs such as mitomycin C, indoloquinone E09 ( Ross D, 

10 et al. (1994) Oncol. Res. 6:493-500), CB 1954 (Knox R.J. et al. (2000) Cancer Res. 60:4179-86) or 
antiestrogens in breast cancer (Montano M.M., Katzenellenbogen B.S. (1997) PNAS 94:2581-6). 

In addition, some of the proteins of the QOR family are thought to play a role in the prevention of 
apoptosis following oxidative stress. The tumor suppressor gene p53 has been directly implicated in the 
induction of apoptosis in dividing cells and in hippocampal pyramidal neurons (Jordan J. et al. (1997) J. 

15 Neurosci 17:1397-405) and a QOR gene has been described as a p53-regulated gene (Kostic C, Shaw 
P.H. (2000) Oncogene 19:3978-87). 

It is believed that the proteins of SEQ ID NOs:417, 413 and 418 have a redox activity, most likely 
as QORs. Thus, they are expected to act as an endogenous antioxidant against oxidative stress and may 
be able to use NADP as cofactor. The proteins of the invention may be used to deactivate toxins and to 

20 activate bioreductive anticancer drugs. In addition, they may prevent apoptosis following oxidative stress 
and be regulated by p53. Because proteins SEQ ID NOs:417 and 413 do not contain the Pfam Zinc- 
binding dehydrogenase (adh zinc) signature, in contrast to SEQ ID NO:418, they may act as a competitive 
inhibitor, i.e. a dominant negative form, of the functional protein. 

The oxidoreductase activity of the proteins of the invention may be assayed using any technique 

25 known to those skilled in the art. For example, the measurement of the rate of oxidation of NADPH and 
oxygen consumption, and the detection of the semiquinone and reactive oxygen species, may be 
performed as described by Gutierrez P.L. (Gutierrez P.L . (2000) Front. Biosci. 5:D629-38), or by any 
other method skilled in the art. The enzymatic activity of the proteins of the invention in different 
affected and control tissues may be assayed by histochemical staining. To confirm the role of the proteins 

30 of the invention in the cellular antioxidant response, in vitro and in vivo assays may be performed. 
Transcription levels of the genes coding for the proteins of the invention may be measured using standard 
techniques after exposure to quinones or derived compounds as beta-naphtoflavone (beta-NF), as 
described by Belinsky M. and Jaiswal A.K. (supra), as well as in response to transcription factors such as 
Nerf, Jun and c-Fos, or in the presence of p53. 
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In one embodiment of the present invention, the present protein can be used to detect specific cell 
types in vitro or in vivo. For example, as the present proteins are overexpressed in the brain, reagents 
capable of specifically recognizing the present protein can be used as markers for brain cells. Brain- 
specific markers have a number of uses, including for the identification of specific tissues for histological 
5 analyses, as well as to detect the origin of tumor cells. In addition, as the expression of the present 
protein is likely induced by transcription factors such as Nrf, Jun, Fos, Fra, Maf, YABP, ARE-BP 1, Ah 
(aromatic hydrocarbon) receptor, and estrogen receptor, as well as by p53, reagents specific for detecting 
the present protein can also be used as a marker for the activity of any of these proteins in vitro or in vivo. 
In view of the association between many of these proteins and diseases such as cancer, the ability to 
10 detect the presence or absence of the proteins provides powerful tools for disease diagnosis and screening. 
For any of these applications, the expression of the present protein can be detected using any standard 
method, including Northern blots, western blots, in situ hybridization, PCR, etc. 

In another embodiment, the proteins of the invention can serve as markers for cellular oxidative 
stress in vivo and in vitro. As such, the proteins of the invention or part thereof may be useful in the 
15 diagnosis of disorders in which oxidative stress is implicated, including a large variety of types of cancer 
as well as neurodegenerative disorders such as Alzheimer's disease (AD), amyothropic lateral sclerosis 
(ALS) or Parkinson disease (PD). For diagnostic purposes, the expression of the protein of the invention 
may be investigated using, e.g. Northern blotting, RT-PCR or immunoblotting methods and compared to 
the expression in control individuals. An increased levels of the proteins of the invention in patients 
'20 compared with controls indicates a major shift in redox balance and, thus, indicates the presence of the 
disease or of a susceptibility for the disease. 

The invention further relates to methods and compositions using the proteins of the invention or 
part thereof to prevent and/or treat disorders in which oxidative stress is implicated, including those 
mentioned above. For these purposes the proteins themselves, or polynucleotides encoding the proteins, 
25 or an activator of protein expression may be administrated to patients, or to disease-free individuals in 
case of increased susceptibility to one of these disorders. 

In another embodiment, the protein of the invention or part thereof is used to prevent cells from 
undergoing apoptosis. They may thus be useful in the diagnosis, treatment and/or prevention of disorders 
and processes in which apoptosis is deleterious, including but not limited to immune deficiency 
30 syndromes (including AIDS), type I diabetes, pathogenic infections, cardiovascular and neurological 
injury, alopecia, aging, degenerative diseases including AD and PD, dystonia, Leber's hereditary optic 
neuropathy and schizophrenia. For all such diagnostic purposes, the expression of the proteins of the 
invention can be investigated using any of the Northern blotting, RT-PCR or immunoblotting methods 
described herein and compared to the expression in control individuals. 



404 



G-078US05DIV 



The invention relates to methods and compositions using the proteins of the invention or part 
thereof as detoxifying enzymes against quinones. There are a variety of quinones with a toxic effect in 
cells (e.g. quinones derived from the oxidation of phenolic metabolites of benzene, DA-quinones, or 
menadione). Thus, the proteins of the invention or part thereof may be protective against the hematotoxic 
5 and carcinogenic effects of benzene, as well as against benzene-caused diseases such as cancer, aplastic 
anemia and pancytopenia. Moreover, they may detoxify DA-quinones in the brain, thereby providing 
neuroprotection in Parkinson's Disease. In still another embodiment, the proteins of the invention or part 
thereof may protect cells against menadione-induced oxidative stress, with known effects on myocardial 
cells (Floreani M. et al (2000) Biochem Pharmacol. 60:601-5). For prevention and/or treatment purposes 
10 the proteins themselves, or polynucleotides encoding the proteins, or an activator of protein expression 
may be administrated. 

In another embodiment, the present proteins may be a target of chemotherapy specific to different 
kinds of cancer, to ensure a favorable response to anticancer drugs. Specifically, proteins of the 
invention or part thereof may be used as an activator of cytotoxic prodrugs of quinone family. 

15 Accordingly, the protein of the invention or part thereof may be administered to a patient in conjunction 
with a bioreductive anticancer agent in order to activate the drug. This co-administration may be by 
simultaneous administration, such as a mixture of the oxidoreductase and the drug, or by separate 
simultaneous or sequential administration. Cancer-specific antitumor agents based on QOR substrates 
may be designed as described by Xing J. et al. (Xing J. et al. (2000) Med. Chem. 43:457-66) and assayed 

20 as described in Li B, et al. (Li B et al. (1999) Chem. Res. Toxicol. 12:1042-9). Alternatively, as the 
present proteins may be overexpressed in tumor cells, such methods may be performed by simply 
detecting the level of the present protein in tumor cells, and administering the prodrug specifically to 
those patients found to have elevated levels of the protein in their tumor cells. 

Proteins of SEQ ID NOs: 415, 310, 317 (internal designation 188-29-2-0-H1-CS, 188-18-4-0-A9-CS, 
25 188-9-2-0-E1-CS) 

Mammalian inositol hexakiphosphate kinase 2 (IP6K2), an enzyme of the inositol phosphate 
pathway, has been cloned and described by two independent groups [Saiardi, A.; Erdument-Bromage, H.; 
Snowman, A. M.; Tempst, P.; and Snyder, S. H., (1999) Current Biology 9, 1323-1326, and Katai, K.; 
Miyamoto, K-L; Kishida, S.; Segawa, H.; Nii, T.; Tanaka, H.; Tani, Y.; Arai, H.; Tatsumi, S.; Morita, K.; 
30 Taketani, Y.; and Takeda, E. (1999) Biochem. J. 343, 705-712]. Newly identified consensus sequences of 
inositol-polyphosphate kinases are represented by [LV]-[LA]-[DE]-X(3-8)-P-X-[VAI]-[ML]-D-X-K- 
[ML]G [Saiardi, A.; Erdument-Bromage, H.; Snowman, A. M.; Tempst, P.; and Snyder, S. H. (1999) 
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Current Biology 9, 1323-1326]. DP6K2 catalyzes the transfer of phosphate groups from InsP6 or 
Ins(l,3,4,5,6)P5 (the substrate), to another protein or small molecule, such as a nucleoside di-phosphate. 

The subject invention provides the polypeptides of SEQ ID NOs:415, 310, and 317, encoded by 
the cDNAs of SEQ ID NOs:174, 69, and 76, respectively. The invention also provides biologically active 
5 fragments of SEQ ID NOs:415, 310, and 317. In one embodiment, the polypeptides of SEQ ID NOs:415, 
310, and 317 are interchanged with the corresponding polypeptides encoded by the human cDNA of clone 
1 88-29-2 -0-H1-CS, 188-18-4-0-A9-CS, or 188-9-2-0-E1-CS. "Biologically active fragments" are defined 
as those peptide or polypeptide fragments having at least one of the biological functions of the full length 
protein (e.g., kinase activity). Compositions of the protein/polypeptide of SEQ ID NOs:415, 310, or 317, 

10 or biologically active fragments thereof, are also provided by the subject invention. These compositions 
may be made according to methods well known in the art. 

The invention also provides variants of the protein of SEQ ID NOs:415, 310, or 317. These 
variants have at least about 80%, more preferably at least about 90%, and most preferably at least about 
95% amino acid sequence identity to the amino acid sequences encoded by SEQ ID NOs:415, 310, and 

15 317. Variants according to the subject invention also have at least one functional or structural 
characteristic of the protein of SEQ ID NOs:415, 310, or 317. The invention also provides biologically 
active fragments of the variant proteins. Compositions of variants, or biologically active fragments 
thereof, are also provided by the subject invention. These compositions may be made according to 
methods well known in the art. Unless otherwise indicated, the methods disclosed herein can be practiced 

20 utilizing the protein encoded by SEQ ID NO:415, 310, or 317, biologically active fragments of SEQ ID 
NO:415, 310, or 317, variants of SEQ ID NO:415, 310, or 317, and biologically active fragments of the 
variants. 

Because of the redundancy of the genetic code, a variety of different DNA sequences can encode 
the amino acid sequence of SEQ ID NO:415, 310, or 317. In a preferred embodiment, SEQ ID NO:415, 

25 310, or 317 is encoded by clone 188-29-2-0-H1-CS, 188-18-4-0-A9-CS, or 188-9-2-0-E1-CS, or by the 
cDNAs of SEQ ID NO: 174, 69, or 76. It is well within the skill of a person trained in the art to create 
these alternative DNA sequences which encode proteins having the same, or essentially the same, amino 
acid sequence. These variant DNA sequences are, thus, within the scope of the subject invention. As 
used herein, reference to "essentially the same" sequence refers to sequences that have amino acid 

30 substitutions, deletions, additions, or insertions that do not materially affect biological activity. 
Fragments retaining one or more characteristic biological activity of the protein encoded by SEQ ID 
NO:415, 310, or 317 are also included in this definition. 

In one aspect of the subject invention, SEQ ID NO:415, 310, or 317, and variants thereof, can be 
used to generate polyclonal or monoclonal antibodies. Both biologically active and immunogenic 
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fragments of SEQ ID NO:415, 310, or 317, or variant proteins, can be used to produce antibodies. 
Polyclonal and/or monoclonal antibodies can be made according to methods well known to the skilled 
artisan. Antibodies produced in accordance with the subject invention can be used in a variety of 
detection assays known to those skilled in the art. The antibodies may be used to agonize or antagonize 
5 the biological activity of the protein of SEQ ID NO:415, 310, or 317. 

The protein of SEQ ID NO:415, 310, or 317 can be used for the synthesis of nucleoside 
triphosphate (NTP) compounds. In one embodiment, the NTP compound produced is ATP, GTP, CTP, 
or TTP. In this aspect of the subject invention, SEQ ID NO:415, 310, or 317 removes a phosphate from 
InsP6 or Ins(l,3,4,5,6)P5 and transfers it to a nucleoside diphosphate (e.g., ADP, CTP, GDP, or TDP) to 
10 create a NTP. The conditions and methods for the synthesis of NTP compounds, such as ATP, are well 
known to the skilled artisan. Thus, the protein of SEQ ED NO:415, 310, or 317 has industrially useful 
function for the synthesis of commercially valuable products. 

The subject invention also provides methods of determining the relative amounts of InsP6 or 
Ins(l,3,4,5,6)P5 in the cell by a kinase assay. In this aspect of the invention, SEQ ID NO:415, 310, or 
15 317 can be used to transfer phospate groups from InsP6 or Ins(l,3,4,5,6)P5 to acceptor substrates 
according to well-known kinase activity assays. 

Protein of SEQ ID NO:294 (internal designation 1 81-16-2 -0-A7-CS) 

The protein of SEQ ID NO:294 is encoded by the cDNA of SEQ ID NO:53. It will be 
appreciated that all characteristics and uses of the polypeptide of SEQ ID NO:294 described throughout 
20 the present application also pertain to the polypeptide encoded by the human cDNA of clone 181-16-2-0- 
A7-CS. In addition, it will be appreciated that all characteristics and uses of the nucleic acid of SEQ ID 
NO:53 described throughout the present application also pertain to the human cDNA of clone 181-16-2-0- 
A7-CS. 

This gene was isolated from fetal liver and expression has also been detected in fetal kidney, 
25 placenta, liver, brain, hypertrophic prostate, salivary gland and testis. Data from PCT application WO 
98/23435 indicate expression is primarily in bone marrow cell lines, and to a lesser extent, in human 
endometrial stromal cells, human adult small intestine and human pancreas tumor. PCT application WO 
99/14484 reports the fraction of expression in the gastrointestinal system (0.227), reproductive system 
(0.193), and hematopoietic/immune system (0.168). Finally, this protein is 55% identical and 76% 
30 similar to CGI-128 protein, which was isolated from CD34+ cells and is also found in cell lines from the 
hematopoietic lineage including, HL6 (granulocytic), Jurkat (T-lymophocytic), K562 (erythro- 
megakaryocytic), and U937 (monocytic). 
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Supernatant harvested from cells expressing the product of this gene has been shown to increase 
the permeability of the plasma membrane of renal mesangial cells to calcium. Thus, it is believed that the 
product of this gene is involved in activating a signal transduction pathway when it binds a receptor on 
the surface of the plasma membrane of both mesangial cells and other cell types, in addition to other cell- 
5 lines or tissue cell types. Thus, polynucleotides and polypeptides have uses, which include, but are not 
limited to, activating mesangial cells by contacting said cells with a full length polypeptide or a 
polypeptide fragment which demonstrates this biological activity. Further, the polynucleotides and 
polypeptides can be used in the methods described in W099 15652, incorporated in its entirety. Binding 
of a ligand to a receptor is known to alter intracellular levels of small molecules, such as calcium, 

10 potassium and sodium, as well as alter pH and membrane potential. Alterations in small molecule 
concentration can be measured to identify supematants, which bind to receptors of a particular cell. In 
addition, when tested against fibroblast cell lines, supematants removed from cells containing this gene 
activated the EGR1 (early growth response gene 1) promoter element. Thus, it is likely that this gene 
activates fibroblast cells through the EGR1 signal transduction pathway. EGR1 is a separate signal 

15 transduction pathway from Jak-STAT, genes containing the EGR1 promoter are induced in various 
tissues and cell types upon activation, leading the cells to undergo differentiation and proliferation (PCT 
application WO 98/23435) 

Polynucleotide comprising sequences encoding the signal peptide of the protein, e.g. 
VLWLSGLSEPGAA/RQ, can be used in construction of secretion vectors. These vectors would then 

20 facilitate the secretion of fusion proteins into the media of cells that have been transfected with the 
construct of interest. Antibodies which specifically bine the signal peptide could be used to purifiy the 
fusion protein from the media if desired. 

Polynucleotides and polypeptides of the invention are useful as reagents for differential 
identification of the tissue(s) or cell type(s) present in a biological sample and for diagnosis of diseases 

25 and conditions which include, but are not limited to, haemopoietic and gastrointestinal tract disorders and 
stromatosis, in addition to endothelial, mucosal, or epithelial cell disorders. Similarly, polypeptides and 
antibodies directed to these polypeptides, are useful in providing immunological probes for differential 
identification of the tissue(s) or cell type(s). For a number of disorders of the above tissues or cells, 
particularly of the immune and digestive systems, expression of this gene at significantly higher or lower 

30 levels may be routinely detected in certain tissues and cell types (e.g. hemaopoietic, immune, 
reproductive, gastrointestinal, endocrine, and cancerous and wounded tissues) or bodily fluids (e.g. 
lymph, serum, plasma, urine, synovial fluid and spinal fluid) or another tissue or cell sample taken from 
an individual having such a disorder, relative to the standard gene expression level, i.e., the expression 
level in healthy tissue or bodily fluid from an individual not having the disorder. 
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The tissue distributioin in bone marrow cells, fetal liver and fetal kidney, combined with the 
detected calcium flux and EGR1 biological activity, indicates that polynucleotides and polypeptides 
corresponding to this gene are useful for immune and gastrointestinal tract disorders, and stromatosis, 
particularly tumors and proliferative disorders. More specifically, polynucleotides and polypeptides 
5 corresponding to this gene are useful for the treatment and diagnosis of hematopoietic related disorders 
such as anemia, pancytopenia, leukopenia, thrombocytopenia or leukemia since stromal cells are 
important in the production of cells of hematopoietic lineages. The polypeptides and polynucleotides of 
the invention can be used to enhance hematopoesis as described in W09831385, incorporated in its 
entirety. The uses include bone marrow cell ex vivo culture, bone marrow transplantation, bone marrow 

10 reconstitution, radiotherapy or chemotherapy of neoplasia. The gene product may also be involved in 
lymphopoiesis, therefore, it can be used in immune disorders such as infection, inflammation, allergy, 
immunodeficiency etc. In addition, this gene product may have commercial utility in the expansion of 
stem cells and committed progenitors of various blood lineages, and in the differentiation and/or 
proliferation of various cell types. Protein as well as, antibodies directed against the protein may show 

1 5 utility as a tumor marker and/or immunotherapy targets for the above listed tissues. 

Additionally, since the gene product of 181 -16-2 -0-A7-CS has been shown to activate the EGR1 
promoter element, it likely activates EGR1 signaling activity in fibroblasts. Recent data shows that 
activation of EGR1 plays a role in wound repair. The cellular transcription factor early growth response 
factor 1 (Egrl) is expressed minutes after acute injury and serves to stimulate the production of a class of 

20 growth factors whose role is to promote tissue repair. Egr-1 expression at the site of dermal wounding in 
rodents promotes angiogenesis in vitro and in vivo, increases collagen production, and accelerates wound 
closure. These results show that Egr-1 gene therapy accelerates the normal healing process (Human Gene 
Ther 2000, vol 1 1(15):2143-58). Thus, an activator of EGR1 signaling, specifically the gene products of 
181-16-2-0-A7-CS (polypeptides and polynucleotides), would be useful in the wound healing process 

25 using the methods described in W09941282 and W09932135, incorporated by reference in their 
entireties. 

Protein of SEP ID NO:305 (internal designation 187-37-0-0-clO-CS) 

The protein of SEQ ID NO:305, encoded by the cDNA of SEQ ID NO:64, is highly expressed in 
the prostate and brain. The protein of the invention is strongly homologous to the D9 protein, found in 
30 both humans (GNP accession number: U95006 and U95007) and in mice (GNP accession number: 
U95003, U95004, and U95005). D9 is a myeloid precursor protein transcript regulated by the retinoic 
acid receptor a, hereafter referred to as RAR-a (Scott et al. Blood 1996; 88: 2517-30). 
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Retinoic acid is the active metabolite of vitamin A, which contributes to a wide range of 
* biological processes such as cellular differentiation, embryogenesis, and tumor suppression. More 
specifically, retinoic acid stimulates myeloid precursor differentiation into mature granulocytes. For 
instance, in vitro treatment of acute promyelocyte leukemia blast cells with retinoic acid induces their 
5 differentiation (Miyauchi et al. Leuk Lymphoma 1999;33:267-80). In addition, treatment with retinoic 
acid can induce disease remission in patients affected with promyelocyte leukemia by causing 
granulocyte precursor differentiation (Slack et al. Ann Hematol 2000;79:227-38). 

The diverse range of responses to retinoic acid are mediated by three receptor subtypes: RAR-a, 
RAR-p, and RAR-y. RAR-a has been identified as being important for bone marrow maturation of 

10 granulocytes (Tsai et al. Genes Dev 1992;6:2258-69). In addition, RAR-a is almost invariably involved 
in acute promyelocyte leukemia cells by a reciprocal translocation between the long arms of 
chromosomes 15 and 17 (Alcalay et al., Proc Natl Acad Sci USA 1991;88:1977-81). This type of 
leukemia is mainly characterized by a predominance of malignant promyelocytes, and severe hemorragic 
manifestations resulting from activation of the coagulation cascade and the fibrinolytic system (Tallman 

15 et al. Semin Thromb Hemost 1999;25:209-15). Reciprocal chromosomal translocation leads to the 
production of a fusion protein that inhibits differentiation and promotes survival of myeloid precursor 
cells (Grignani et al. Cell 1993;74, 423-431). Transient transfection of a vector containing RAR-a in a 
promyelocyte cell line causes upregulation in an early manner of several genes, including D9, which is 
strongly related to protein of SEQ ED NO:305 (Scott et al. Blood 1996; 88: 2517-30). Thus, it is believed 

20 that the protein of SEQ ID NO: 305 is a myeloid-related protein whose expression is induced by the 
activation of retinoic acid receptors, including RAR-a. 

In a preferred embodiment, the protein of the invention or part thereof may be used to assay the 
activity of RAR-a protein or retinoic acid in a biological sample. Specifically, as the expression of the 
protein is believed to be under the direct control of retinoic acid receptors, the level of the protein of the 

25 invention, or of the mRNA encoding the protein, can serve as a sensitive and immediate marker for the 
effects of retinoic acid upon a cell. An ability to detect retinoic acid receptor activation in cells using the 
present protein has numerous uses. For instance, the protein of the invention or part thereof can be used 
to monitor the effects of retinoic acid on cells of a patient undergoing retinoic acid treatment for 
promyelocyte leukemia (Slack et al. Ann Hematol 2000;79:227-38). As retinoic acid treatment is 

30 associated with frequent retarded dose-dependant side effects, it is believed that an assay based on protein 
of SEQ ID NO:305 could be used to adjust the dose of retinoic acid administered in patients affected with 
promyelocyte leukemia, in order to predict and avoid such adverse side -effects (Slack et al. Ann Hematol 
2000;79:227-38). 
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In another embodiment, the present polypeptides and polynucleotides can be used to identify 
myeloid precursors, as well as brain and prostate tissues. The ability to specifically visualize myeloid 
precursor cells, as well as brain and prostate tissues (and cells derived from the tissues), is useful for any 
of a number of applications, including to determine the origin or identity of, e.g. cancerous cells, as well 
5 as to facilitate the identification of particular cells and tissues for, e.g. the evaluation of histological slides. 
In addition, such assays can be used to examine the extent of differentiation in myeloid precursor cells. 

The present invention further relates to in vitro assays and diagnostic kits based on the protein of 
the present invention or part thereof. Such assays may be used for diagnosis of disorders where the 
protein activity is abnormally downregulated, such as cancer, and hematological disorders including 

10 leukemia. As the protein of SEQ ID NO:305, RAR-ot, and acute promyelocyte leukemia are all related, 
variation in the measured level of the present protein of the invention or part thereof can be used as a 
diagnostic or screening test for acute promyelocyte leukemia, e.g. using a biological sample such as 
serum or plasma. Further, an assay that can detect an abnormal level of the protein of the invention or 
part thereof can be used to detect residual disease in acute promyelocyte leukemia. Such an assay may 

15 be used to aid therapeutic decisions in this disorder, e.g. more or less aggressive treatments, the duration 
of treatment, etc. 

In another embodiment, various methods can be used to modulate activity and/or expression of 
the protein of SEQ ID NO:305, e.g. for the treatment, attenuation and/or prevention of various disorders. 
In one embodiment, any of a number of reagents, e.g. polynucleotides encoding the protein of SEQ ED 

20 NO:305 or a fragment thereof, the protein of SEQ ID NO:305 itself, or a compound that increased the 
expression or activity of the protein of SEQ ID NO:305, can be administered to a patient suffering from, 
or at risk of developing, various disorders including cancer, and hematological diseases such as leukemia, 
and neutropenia. For instance, but not limited to it, proteins or other capable of enhancing the expression 
or activity of the protein of SEQ ID NO:305 can be administered to treat patients affected with acute 

25 promyelocyte leukemia, in order to induce differentiation of the affected cells into mature granulocytes 
(Slack et al. Ann Hematol 2000;79:227-38). In still another preferred embodiment, proteins or other 
compounds capable of increasing the expression or activity of the protein of the invention can be used to 
treat, prevent and/or attenuate neutropenia or agranulocytosis patients, in order to induce in vivo 
differentiation of myeloid precursors into mature granulocytes. In still another preferred embodiment, 

30 proteins or other compounds capable of increasing the expression or activity of the protein of SEQ ID 
NO:305 can be used to treat coagulopathic diseases, such as thrombosis or hemorragic manifestations. 
For instance, they can be used to treat disseminated intravascular coagulation, a severe hemorragic 
syndrome. This embodiment is supported by the fact that acute promyelocyte leukemia is frequently 
associated with disseminated intravascular coagulation (Tallman et al. Semin Thromb Hemost 
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1999;25:209-15), and disseminated intravascular coagulation is efficiently corrected with retinoic acid 
(Dombret et al. Leukemia 1993;7:2-9). 

In addition, modulation of the expression or activity of the protein of the invention can be used to 
modulate differentiation of cells, e.g. promyelocyte leukemia. In one such embodiment, the protein of 
5 the invention is inhibited, e.g. using antisense molecules, antibodies, or small molecule inhibitors of the 
expression or activity of the protein, in order to maintain the undifferentiated state of cells grown in vitro. 
Alternatively, agents that increase the expression or activity of the protein in cells can be used to induce 
cellular differentiation, e.g. in the preparation of specific cell types in vitro for particular therapeutic 
applications. 

10 Protein of SEQ ID NO:248 (internal designation 105-035-2 -0-C6-CS) and SEQ ID NO:313 (internal 
designation 188-28-4-0-D4-CS) 

The proteins of SEQ ID NO:248, encoded by the cDNA of SEQ ID NO:7, and SEQ ID NO: 3 1 3, 
encoded by the cDNA of SEQ ID NO: 72, are highly expressed in brain, liver, pancreas, and testis. The 
proteins of the invention are nuclear proteins (Miller et al. J Biol Chem 2000;275:32052-6) that display a 

15 membrane-spanning segment from amino acids 58 to 78. These proteins are homologous to the human 
RNA polymerase II elongation factor ELL3 (EMBL accession number AF276512 ; Miller et al. J Biol 
Chem. 2000; 275:32052-6). In addition, the proteins of SEQ ID NO:248 and SEQ ID NO:313 share 
sequence homology with two other members of the polymerase II elongation factor family: ELL, and 
ELL2. The protein of SEQ ID NO:313 is similar to the N-terminal sequence the protein of SEQ ID 

20 NO:248, but differs after residue 240 because of a frameshift that produces a premature stop in the 
sequence SEQ ID NO:72 (Miller et al. J Biol Chem 2000; 275:32052-6). Additionally, the alignment of 
the protein of SEQ ID NO:248 with occludin, an integral membrane protein found at tight junctions 
(Furuse et al. J Cell Biol 1994; 127:1617-26), reveals that both proteins display a C-terminal ZO-1 
binding domain, with a 26% homology over a 108 amino acid segment. Protein SEQ ID NO:313 lacks 

25 this domain, as its C-terminal region is truncated as compared to the protein of SEQ ID NO: 248. ZO-1 is 
part of the family of membrane-associated guanylate kinase homologs (MAGUKs) believed to be 
important in signal transduction originating from sites of cell-cell contact (Willott et al. Proc Natl Acad 
Sci USA 1993; 90:7834-8). 

The proteins of SEQ ID NOs:248 and 313 are RNA polymerase II elongation factors that increase 

30 the catalytic rate of transcription elongation, a phase during which RNA polymerase II moves along the 
DNA and extends the growing RNA chain (Miller et al. J Biol Chem 2000; 275:32052-6). Specifically, 
the proteins of SEQ ED NOs:248 and 313 suppress transient pausing at multiple sites along the DNA, 
thereby altering the K m and/or the V^. of the polymerase (Miller et al. J Biol Chem 2000; 275:32052-6). 
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The present proteins belong to a family that is known to include one virally encoded protein (Tat) and six 
cellular proteins (SIX, P-TEFb, TFIIF, Elongin (SIII), ELL and ELL2). 

A growing body of evidence suggests that mis-regulation of elongation may be a key element in a 
variety of human diseases (see, Aso et al. J Clin Invest 1996; 97:1561-9). For instance, two RNA 
5 polymerase II elongation proteins have been implicated in oncogenesis: ELL, which is a frequent target 
for translocation in acute myeloid leukemia (Thirman et al. Proc Natl Acad Sci USA 1994; 91:121 10-4 ; 
Mitani et al. Blood 1995;85:2017-24), and elongin, which is a transcription factor regulated by the 
product of the von Hippel-Lindau tumor suppressor gene, which is itself mutated in the majority of clear- 
cell renal carcinomas and in families with von Hippel-Lindau disease (Duan et al. Science 

10 1995;269:1402-6, Kibel et al. Science 1995; 269:1444-6). In addition, overexpression of ELL leads to the 
transformation of fibroblasts (Kanda et al J Biol Chem. 1998 27; 273:5248-52). Thus, the proteins of 
SEQ ID NOs:248 and 313 may be important for oncogenesis of multiple types of neoplastic diseases, 
especially hematological malignancies. 

In one embodiment, the present proteins are used to increase the rate of transcription in vitro. 

15 Such an increase can be used for any of the large number of in vitro transcription reactions which are 
routinely used for molecular biological applications, e.g. for the preparation of RNA, for protein 
production, for the characterization of promoters and transcription factors, etc. 

In another embodiment, the present invention provides diagnostic tools for the detection of 
mutations in the genes encoding SEQ ID NOs:248 or 313. Such mutations may be detected by a variety 

20 of techniques, including RNase and SI protection assays; alterations in electrophoretic mobility of DNA 
fragments in gels, with or without denaturing agents such as SSCP or DGGE; dHPLC; and direct DNA 
sequencing. The detection of mutations in the genes encoding SEQ ID NOs:248 or 313 are useful for the 
detection of a number of diseases and conditions, such as cancers and hematological malignancies 
including leukemia. For example, the RNA polymerase II Elongation Factor ELL gene undergoes 

25 frequent translocations in acute myeloid leukemia (Thirman et al. Proc Natl Acad Sci USA 1994; 
91:12110-4 ; Mitani et al. Blood 1995; 85:2017-24), and it is likely that other elongation factors are 
involved in additional such diseases. 

Another embodiment of the present inventions relates to compositions and methods for using the 
proteins or part thereof to specifically visualize myeloid precursor cells, as well as pancreas, liver and 

30 testis tissues (and cells derived from the tissues). The ability to detect such cell types is useful for any of 
a number of applications, including to determine the origin or identity of, e.g. cancerous cells, as well as 
to facilitate the identification of particular cells and tissues for, e.g. the evaluation of histological slides. 
In addition, such methods can be used to examine the extent of differentiation in myeloid or myeloid- 
progenitor cells for staging of leukemia or any other neoplastic disorder. Any method for detecting the 
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presence of the proteins of the invention, or nucleic acids encoding the proteins, can be used, including 
methods involving the use of antibodies immunospecific for the proteins of invention. Such antibodies 
can be used in various methods including radioimmunoassays, competitive binding assays, Western Blot 
analysis and enzyme -linked immunosorbant assay (ELISA) assays, or any other technique known to those 
5 skilled in the art. In another embodiment, the present protein or part thereof can be used for the treatment, 
attenuation and/or prevention of conditions associated with unbalanced amounts and/or activity of the 
protein of SEQ ID NO:248 or 313. Other modulatory substances can also be used in such embodiments, 
including chemical compounds such as agonists and antagonists, nucleic acids including antisense and 
ribozyme sequences, and antibodies. In a preferred embodiment, such substances are employed for the 

10 treatment or prevention of certain types of neoplastic disorders associated such as cancer or hematological 
malignancies such as leukemia. In such embodiments, where an increased level of expression or activity 
of the present proteins is correlated with the presence of a disease such as cancer, the disease can be 
treated or prevented using any agent that can provoke a decrease in the level of activity or expression of 
the protein, such as antibodies, antisense molecules, ribozymes, dominant negative forms of the protein, 

15 compounds that inhibit the expression or activity of the proteins, and others. Alternatively, in cases 
where a decreased level of expression or activity of the proteins is correlated with the presence of a 
disease such as cancer, the disease can be treated using any agent that can cause an increase in the 
expression or activity of the protein, such as polynucleotides encoding the proteins, purified forms of the 
proteins, or any compound that causes an increase in the expression or activity of the proteins. Further, 

20 any detection of a correlation between the level of expression or activity of the protein and the presence or 
absence of a disease can be used to develop diagnostic or screening tools for the detection of the disease 
itself, or of a predisposition for the disease. 

Uses of antibodies 

Antibodies of the present invention have uses that include, but are not limited to, methods known 
25 in the art to purify, detect, and target the polypeptides of the present invention including both in vitro and 
in vivo diagnostic and therapeutic methods. An example of such use using immunoaffinity 
chromatography is given below. The antibodies of the present invention may be used either alone or in 
combination with other compositions. For example, the antibodies have use in immunoassays for 
qualitatively and quantitatively measuring levels of antigen-bearing substances, including the polypeptides 
30 of the present invention, in biological samples (See, e.g., Harlow et al y 1988). (Incorporated by reference 
in the entirety). The antibodies may also be used in therapeutic compositions for killing cells expressing the 
protein or reducing the levels of the protein in the body. 
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The invention further relates to antibodies that act as agonists or antagonists of the polypeptides 
of the present invention. For example, the present invention includes antibodies that disrupt the 
receptor/ligand interactions with the polypeptides of the invention either partially or fully. Included are 
both receptor-specific antibodies and ligand-specific antibodies. Included are receptor-specific 
5 antibodies, which do not prevent ligand binding but prevent receptor activation. Receptor activation (i.e., 
signaling) may be determined by techniques described herein or otherwise known in the art. Also include 
are receptor-specific antibodies which both prevent ligand binding and receptor activation. Likewise, 
included are neutralizing antibodies that bind the ligand and prevent binding of the ligand to the receptor, 
as well as antibodies that bind the ligand, thereby preventing receptor activation, but do not prevent the 

10 ligand from binding the receptor. Further included are antibodies that activate the receptor. These 
antibodies may act as agonists for either all or less than all of the biological activities affected by ligand- 
mediated receptor activation. The antibodies may be specified as agonists or antagonists for biological 
activities comprising specific activities disclosed herein. The above antibody agonists can be made using 
methods known in the art. See eg., WO 96/40281; US Patent 5,811,097; Deng et al (1998); Chen et al 

15 (1998); Harrop et al (1998); Zhu et al (1998); Yoon et al (1998); Prat et al (1998); Pitard et al (1997): 
Liautard et al (1997); Carlson et al (1997); Taryman et al (1995); Muller et al (1998); Bartunek et al 
(1996) (said references incorporated by reference in their entireties). 

As discussed above, antibodies of the polypeptides of the invention can, in turn, be utilized to 
generate anti-idiotypic antibodies that "mimic" polypeptides of the invention using techniques well 

20 known to those skilled in the art (See, e.g. Greenspan and Bona (1989) and Nissinoff (1991), which 
disclosures are hereby incorporated by reference in their entireties). For example, antibodies which bind 
to and competitively inhibit polypeptide multimerization or binding of a polypeptide of the invention to 
ligand can be used to generate anti-idiotypes that "mimic" the polypeptide multimerization or binding 
domain and, as a consequence, bind to and neutralize polypeptide or its ligand. Such neutralization anti- 

25 idiotypic antibodies can be used to bind a polypeptide of the invention or to bind its ligands/receptors, and 
thereby block its biological activity. 

Immunoaffinitv Chromatography 

Antibodies prepared as described herein are coupled to a support. Preferably, the antibodies are 
monoclonal antibodies, but polyclonal antibodies may also be used. The support may be any of those 
30 typically employed in immunoaffinity chromatography, including Sepharose CL-4B (Pharmacia, 
Piscataway, NJ), Sepharose CL-2B (Pharmacia, Piscataway, NJ), Affi-gel 10 (Biorad, Richmond, CA), or 
glass beads. 
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The antibodies may be coupled to the support using any of the coupling reagents typically used in 
immunoaffinity chromatography, including cyanogen bromide. After coupling the antibody to the 
support, the support is contacted with a sample which contains a target polypeptide whose isolation, 
purification or enrichment is desired. The target polypeptide may be a polypeptide selected from the 
5 group consisting of sequences of SEQ ID Nos: 242-482, mature polypeptides included in SEQ ID Nos: 
242-272 and 274-384 as well as full-length and mature polypeptides encoded by the clone inserts of the 
deposited clone pool, variants and fragments thereof, or a fusion protein comprising said selected 
polypeptide or a fragment thereof. 

Preferably, the sample is placed in contact with the support for a sufficient amount of time and 
10 under appropriate conditions to allow at least 50% of the target polypeptide to specifically bind to the 
antibody coupled to the support. 

Thereafter, the support is washed with an appropriate wash solution to remove polypeptides 
which have non-specifically adhered to the support. The wash solution may be any of those typically 
employed in immunoaffinity chromatography, including PBS, Tris-lithium chloride buffer (0.1M lysine 
15 base and 0.5M lithium chloride, pH 8.0), Tris-hydrochloride buffer (0.05M Tris-hydrochloride, pH 8.0), 
or Tris/Triton/NaCl buffer (50mM Tris.cl, pH 8.0 or 9.0, 0.1% Triton X-100, and 0.5MNaCl). 

After washing, the specifically bound target polypeptide is eluted from the support using the high 
pH or low pH elution solutions typically employed in immunoaffinity chromatography. In particular, the 
elution solutions may contain an eluant such as tfiethanolamine, diethylamine, calcium chloride, sodium 
20 thiocyanate, potasssium bromide, acetic acid, or glycine. In some embodiments, the elution solution may 
also contain a detergent such as Triton X-100 or octyl-beta-D-glucoside. 

Import vectors 

The GENSET polypeptides of the invention may also be used as a carrier to import a protein or 
peptide of interest, so-called cargo, into tissue-culture cells or in host organisms. A hydrophobic region 

25 of a GENSET polypeptide or a fragment thereof, preferably the signal peptide of a sequence selected 
from the group consisting of of SEQ ID Nos: 1-31 and 33-143 and clones inserts of the deposited clone 
pool, more preferably the short core hydrophobic region (h) of signal peptides may be used as a carrier. 

When cell permeable peptides of limited size (approximately up to 25 amino acids) are to be 
translocated across cell membrane, chemical synthesis may be used in order to add the h region to either 

30 the C-terminus or the N-terminus to the cargo peptide of interest. Alternatively, when longer peptides or 
proteins are to be imported into cells, nucleic acids can be genetically engineered, using techniques 
familiar to those skilled in the art, in order to link the cDNA sequence or fragment thereof encoding the 
hydrophobic region to the 5' or the 3' end of a DNA sequence coding for a cargo polypeptide. Such 
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genetically engineered nucleic acids are then translated either in vitro or in vivo after transfection into 
appropriate cells, using conventional techniques to produce the resulting cell permeable polypeptide. 
Suitable hosts cells are then simply incubated with the cell permeable polypeptide which is then 
translocated across the membrane. 
5 This method may be applied to study diverse intracellular functions and cellular processes. For 

instance, it has been used to probe functionally relevant domains of intracellular proteins and to examine 
protein-protein interactions involved in signal transduction pathways (Lin et al, J. Biol Chem., 270: 
14225-14258 (1995); Lin et al, J. Biol Chem., 271: 5305-5308 (1996); Rojas et al, J. Biol Chem., 271: 
27456-27461 (1996); Rojas et al, Nature Biotech., 16: 370-375 (1998); Liu et al, Proc. Natl Acad. Sci. 

10 USA, 93: 11819-11824 (1996); Rojas et al, Bioch. Biophys. Res. Commun., 234: 675-680 (1997) Du et 
al, J. Peptide Res., 51: 235-243 (1998)). 

Such techniques may be used in cellular therapy to import proteins producing therapeutic effects. 
For instance, cells isolated from a patient may be treated with imported therapeutic proteins and then re- 
introduced into the host organism. 

15 Alternatively, the hydrophobic region of signal peptides of the present invention could be used in 

combination with a nuclear localization signal to deliver nucleic acids into cell nucleus. Such 
oligonucleotides may be antisense oligonucleotides or oligonucleotides designed to form triple helixes, as 
described herein, in order to respectively inhibit processing or maturation of a target cellular RNA. 

Expression of GENSET products 

20 Spatial expression of the GENSET genes of the invention 

Tissue expression of the cDNAs of the present invention was examined. Table DC list the 
Genset's libraries of tissues and cell types examined that express the polynucleotides of the present 
invention. The tissues and cell types examined for polynucleotide expression were: adrenal gland (AG), 
bone marrow (BM), brain (Br), cancerous protate (CP), cerebellum (Ce), colon (Co), dystrophic muscle 
25 (DM), fetal brain (FB), fetal kidney (FK), fetal liver (FL), heart (He), hypertrophic prostate (HP), kidney 
(Ki), liver (Li), lung (Lu), lung cells (LC), lymph ganglia (LG), lymphocytes (Ly), muscle (Mu), Ovary 
(Ov), pancreas (Pa), pituitary gland (PG), placenta (PI), prostate (Pr), salivary gland (SG), spinal cord 
(SC), spleen (Sp), stomach/intestine (SI), substantia nigra (SN), testis (Te), thyroid (Ty), umbilical cord 
(UC) and uterus (Ut). 

30 For each cDNA referred to by its sequence identification number (first column), the number of 

proprietary 5'ESTs (i.e. cDNA fragments) expressed in a particular tissue referred to by its name is 
indicated after a semi column (second column). In addition, the bias in the spatial distribution of the 
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polynucleotide sequences of the present invention was examined by comparing the relative proportions of 
the biological polynucleotides of a given tissue using the following statistical analysis. The under- or 
over-representation of a polynucleotide of a given cluster in a given tissue was performed using the 
normal approximation of the binomial distribution. When the observed proportion of a polynucleotide of 
5 a given tissue in a given consensus had less than 1% chance to occur randomly according to the chi2 test, 
the frequency bias was reported as "preferred". The results are given in Table X as follows. For each 
polynucleotide showing a bias in tissue distribution as referred to by its sequence identification number in 
the first column, the list of tissues where the polynucleotides are under-represented is given in the second 
column entitled "low frequency expression" and the list of tissues where the polynucleotides are over- 

10 represented is given in the third column entitled "high frequency expression". 

The cellular localization of some polypeptides of the invention was also determined using the 
"psort software" (Nakai, and Horton, (1999); Nakai and Kanehisa, (1992), which disclosures are hereby 
incorporated by reference in their entireties). For each polypeptide identified by its sequence 
identification number in the first column, the second column of Table XI list the predicted subcellular 

15 localization. 

Evaluation of Expression Levels and Patterns of GENSET mRNAs 

The spatial and temporal expression patterns of GENSET mRNAs, as well as their expression 
levels, may also be further determined as follows. 

Expression levels and patterns of GENSET mRNAs may be analyzed by solution hybridization 

20 with long probes as described in International Patent Application No. WO 97/05277, the entire contents 
of which are hereby incorporated by reference. Briefly, a GENSET polynucleotide, or fragment thereof 
corresponding to the gene encoding the mRNA to be characterized is inserted at a cloning site 
immediately downstream of a bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produce 
antisense RNA. Preferably, the GENSET polynucleotide is at least a 100 nucleotides in length. The 

25 plasmid is linearized and transcribed in the presence of ribonucleotides comprising modified 
ribonucleotides (i.e. biotin-UTP and DIG-UTP). An excess of this doubly labeled RNA is hybridized in 
solution with mRNA isolated from cells or tissues of interest. The hybridizations are performed under 
standard stringent conditions (40-50°C for 16 hours in an 80% formamide, 0.4 M NaCl buffer, pH 7-8). 
The unhybridized probe is removed by digestion with ribonucleases specific for single-stranded RNA (i.e. 

30 RNases CL3, Tl, Phy M, U2 or A). The presence of the biotin-UTP modification enables capture of the 
hybrid on a microtitration plate coated with streptavidin. The presence of the DIG modification enables 
the hybrid to be detected and quantified by ELISA using an anti-DIG antibody coupled to alkaline 
phosphatase. 
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The GENSET cDNAs, or fragments thereof may also be tagged with nucleotide sequences for the 
serial analysis of gene expression (SAGE) as disclosed in UK Patent Application No. 2 305 241 A, the 
entire contents of which are incorporated by reference. In this method, cDNAs are prepared from a cell, 
tissue, organism or other source of nucleic acid for which it is desired to determine gene expression 
5 patterns. The resulting cDNAs are separated into two pools. The cDNAs in each pool are cleaved with a 
first restriction endonuclease, called an "anchoring enzyme," having a recognition site which is likely to 
be present at least once in most cDNAs. The fragments which contain the 5' or 3' most region of the 
cleaved cDNA are isolated by binding to a capture medium such as streptavidin coated beads. A first 
oligonucleotide linker having a first sequence for hybridization of an amplification primer and an internal 

10 restriction site for a "tagging endonuclease" is ligated to the digested cDNAs in the first pool. Digestion 
with the second endonuclease produces short "tag" fragments from the cDNAs. A second oligonucleotide 
having a second sequence for hybridization of an amplification primer and an internal restriction site is 
ligated to the digested cDNAs in the second pool The cDNA fragments in the second pool are also 
digested with the "tagging endonuclease" to generate short "tag" fragments derived from the cDNAs in 

15 the second pool. The "tags" resulting from digestion of the first and second pools with the anchoring 
enzyme and the tagging endonuclease are ligated to one another to produce "ditags." In some 
embodiments, the ditags are concatamerized to produce ligation products containing from 2 to 200 ditags. 
The tag sequences are then determined and compared to the sequences of the GENSET cDNAs to 
determine which genes are expressed in the cell, tissue, organism, or other source of nucleic acids from 

20 which the tags were derived. In this way, the expression pattern of a GENSET gene in the cell, tissue, 
organism, or other source of nucleic acids is obtained. 

Quantitative analysis of GENSET gene expression may also be performed using arrays. For 
example, quantitative analysis of gene expression may be performed with GENSET polynucleotides, or 
fragments thereof in a complementary DNA microarray as described by Schena et al (1995 and 1996) 

25 which disclosures are hereby incorporated by reference in their entireties. GENSET cDNAs or fragments 
thereof are amplified by PCR and arrayed from 96-well microtiter plates onto silylated microscope slides 
using high-speed robotics. Printed arrays are incubated in a humid chamber to allow rehydration of the 
array elements and rinsed, once in 0.2% SDS for 1 min, twice in water for 1 min and once for 5 min in 
sodium borohydride solution. The arrays are submerged in water for 2 min at 95°C, transferred into 0.2% 

30 SDS for 1 min, rinsed twice with water, air dried and stored in the dark at 25°C. Cell or tissue mRNA is 

isolated or commercially obtained and probes are prepared by a single round of reverse transcription. 

Probes are hybridized to 1 cm 2 microarrays under a 14 x 14 mm glass coverslip for 6-12 hours at 60°C. 

Arrays are washed for 5 min at 25°C in low stringency wash buffer (IX SSC/0.2% SDS), then for 10 min 

at room temperature in high stringency wash buffer (0.1X SSC/0.2% SDS). Arrays are scanned in 0.1X 

# 
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SSC using a fluorescence laser scanning device fitted with a custom filter set. Accurate differential 
expression measurements are obtained by taking the average of the ratios of two independent 
hybridizations. 

Quantitative analysis of the expression of genes may also be performed with GENSET cDNAs or 
5 fragments thereof in complementary DNA arrays as described by Pietu et al (1996), which disclosure is 
hereby incorporated by reference in its entirety. The GENSET polynucleotides of the invention or 
fragments thereof are PCR amplified and spotted on membranes. Then, mRNAs originating from various 
tissues or cells are labeled with radioactive nucleotides. After hybridization and washing in controlled 
conditions, the hybridized mRNAs are detected by phospho-imaging or autoradiography. Duplicate 
10 experiments are performed and a quantitative analysis of differentially expressed mRNAs is then 
performed. 

Alternatively, expression analysis of GENSET genes can be done through high density nucleotide 
arrays as described by Lockhart et al (1996) and Sosnowski et al (1997), which disclosures are hereby 
incorporated by reference in their entireties. Oligonucleotides of 15-50 nucleotides corresponding to 

15 sequences of a GENSET polynucleotide or fragments thereof are synthesized directly on the chip 
(Lockhart et al, supra) or synthesized and then addressed to the chip (Sosnowski et al, supra). 
Preferably, the oligonucleotides are about 20 nucleotides in length. cDNA probes labeled with an 
appropriate compound, such as biotin, digoxigenin or fluorescent dye, are synthesized from the 
appropriate mRNA population and then randomly fragmented to an average size of 50 to 100 nucleotides. 

20 The said probes are then hybridized to the chip. After washing as described in Lockhart et al, (supra) and 
application of different electric fields (Sosnowsky et al, supra), the dyes or labeling compounds are 
detected and quantified. Duplicate hybridizations are performed. Comparative analysis of the intensity of 
the signal originating from cDNA probes on the same target oligonucleotide in different cDNA samples 
indicates a differential expression of the GENSET mRNA. 

25 Uses of GENSET expression data 

Once the expression levels and patterns of a GENSET mRNA has been determined using any 
technique known to those skilled in the art, in particular those described in the section entitled 
"Evaluation of Expression Levels and Patterns of GENSET mRNAs", or using the instant disclosure, 
these information may be used to design GENSET specific markers for detection, identification, 
30 screening and diagnosis purposes as well as to design DNA constructs with an expression pattern similar 
to a GENSET expression pattern. 
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Detection of GENSET expression and/or biological activity 

The invention further relates to methods of detection of GENSET expression and/or biological 
activity in a biological sample using the polynucleotide and polypeptide sequences described herein. Such 
method scan be used, for example, as a screen for normal or abnormal GENSET expression and/or 
5 biological activity and, thus, can be used diagnostically. The biological sample for use in the methods of 
the present invention includes a suitable sample from, for example, a mammal, particularly a human. For 
example, the sample can be issued from tissues or cell lines having the same origin as tissues or cell lines 
in which the polypeptide is known to be expressed using the data from Table IX. 

Detection of GENSET products 

10 The invention further relates to methods of detection of GENSET polynucleotides or polypeptides 

in a sample using the sequences described herein and any techniques known to those skilled in the art. 
For example, a labeled polynucleotide probe having all or a functional portion of the nucleotide sequence 
of a GENSET polynucleotide can be used in a method to detect a GENSET polynucleotide in a sample. 
In one embodiment, the sample is treated to render the polynucleotides in the sample available for 

15 hybridization to a polynucleotide probe, which can be DNA or RNA. The resulting treated sample is 
combined with a labeled polynucleotide probe having all or a portion of the nucleotide sequence of the 
GENSET cDNA or genomic sequence, under conditions appropriate for hybridization of complementary 
sequences to occur. Detection of hybridization of polynucleotides from the sample with the labeled 
nucleic probe indicates the presence of GENSET polynucleotides in a sample. The presence of GENSET 

20 mRNA is indicative of GENSET expression. 

Consequently, the invention comprises methods for detecting the presence of a polynucleotide 
comprising a nucleotide sequence selected from a group consisting of the sequences of SEQ ID Nos: 1- 
241, the sequences of clone inserts of the deposited clone pool, sequences fully complementary thereto, 
fragments and variants thereof in a sample. In a first embodiment, said method comprises the following 

25 steps of: 

a) bringing into contact said sample and a nucleic acid probe or a plurality of nucleic acid probes 
which hybridize to said selected nucleotide sequence; and 

b) detecting the hybrid complex formed between said probe or said plurality of probes and said 
polynucleotide. 

30 In a preferred embodiment of the above detection method, said nucleic acid probe or said plurality 

of nucleic acid probes is labeled with a detectable molecule. In another preferred embodiment of the 
above detection method, said nucleic acid probe or said plurality of nucleic acid probes has been 
immobilized on a substrate. In still another preferred embodiment, said nucleic acid probe or said 
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plurality of nucleic acid probes has a sequence comprised in a sequence complementary to said selected 
sequence. 

In a second embodiment, said method comprises the following steps of: 

a) contacting said sample with amplification reaction reagents comprising a pair of amplification 
5 primers located on either side of the region of said nucleotide sequence to be amplified; 

b) performing an amplification reaction to synthesize amplification products containing said 
region of said selected nucleotide sequence; and 

c) detecting said amplification products. 

In a preferred embodiment of the above detection method, when the polynucleotide to be 
10 amplified is a RNA molecule, preliminary reverse transcription and synthesis of a second cDNA strand 
are necessary to provide a DNA template to be amplified. In another preferred embodiment of the above 
detection method, the amplification product is detected by hybridization with a labeled probe having a 
sequence which is complementary to the amplified region. In still another preferred embodiment, at least 
one of said amplification primer has a sequence comprised in said selected sequence or in the sequence 
15 complementary to said selected sequence. 

Alternatively, a method of detecting GENSET expression in a test sample can be accomplished 
using any product which binds to a GENSET polypeptide of the present invention or a portion of a 
GENSET polypeptide. Such products may be antibodies, binding fragments of antibodies, polypeptides 
able to bind specifically to GENSET polypeptides or fragments thereof, including GENSET agonists and 
20 antagonists. Detection of specific binding to the antibody indicates the presence of a GENSET 
polypeptide in the sample (e.g., ELISA). 

Consequently, the invention is also directed to a method for detecting specifically the presence of 
a GENSET polypeptide according to the invention in a biological sample, said method comprising the 
following steps of: 

25 a) bringing into contact said biological sample with a product able to bind to a polypeptide of the 

invention or fragments thereof; 

b) allowing said product to bind to said polypeptide to form a complex; and 
b) detecting said complex. 

In a preferred embodiment of the above detection method, the product is an antibody. In a more 
30 preferred embodiment, said antibody is labeled with a detectable molecule. In another more preferred 
embodiment of the above detection method, said antibody has been immobilized on a substrate. 

In addition, the invention also relates to methods of determining whether a GENSET product (e.g. 
a polynucleotide or polypeptide) is present or absent in a biological sample, said methods comprising the 
steps of: 
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a) obtaining said biological sample from a human or non-human animal, preferably a mammal; 

b) contacting said biological sample with a product able to bind to a GENSET polynucleotide or 
polypeptide of the invention; and 

c) determining the presence or absence of said GENSET product in said biological sample. 

5 Compounds that specifically binds a GENSET product may either be compounds binding to a 

GENSET polypeptide (e.g. binding proteins, antibodies or binding fragments thereof (e.g. F(ab')2 
fragments) or compounds bindint to a GENSET polynucleotide (e.g. a complementary probe or primer). 

The present invention also relates to kits that can be used in the detection of GENSET expression 
products. The kit can comprise a compound that specifically binds a GENSET polypeptide (e.g. binding 
10 proteins, antibodies or binding fragments thereof (e.g. F(ab')2 fragments) or a GENSET mRNA (e.g. a 
complementary probe or primer), for example, disposed within a container means. The kit can further 
comprise ancillary reagents, including buffers and the like. 

Detection of a GENSET biological activity 

The invention further includes methods of detecting specifically a GENSET biological activity. 
15 Assessing the GENSET biological activity may be performed using a variety of techniques, including 
those described herein. 

Consequently, the invention is directed to a method for detecting specifically GENSET biological 
activity in a biological sample, said method comprising the following steps: 

a) obtaining a biological sample from a human or non-human mammal; and 
20 b) detecting a GENSET biological activity. 

The present invention also relates to kits that can be used in the detection of GENSET biological 

activity. 

Identification of a specific context of GENSET expression 

When the expression pattern of a GENSET mRNA shows that a GENSET gene is specifically 
25 expressed in a given context, probes and primers specific for this gene as well as antibodies binding to the 
GENSET polynucleotide may then be used as markers for a specific context. Examples of specific 
contexts are: specific expression in a given tissue/cell or tissue/cell type, expression at a given stage of 
development of a process such as embryo development or disease development, or specific expression in 
a given organelle. Such primers, probes, and antibodies are useful commercially to identify 
30 tissues/cells/organelles of unknown origin, for example, forensic samples, differentiated tumor tissue that 
has metastasized to foreign bodily sites, or to differentiate different tissue types in a tissue cross-section 
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using any technique known to those skilled in the art including in situ PCR or immunochemistry for 
example. 

For example, the cDNAs and proteins of the sequence listing and fragments thereof, may be used 
to distinguish human tissues/cells from non-human tissues/cells and to distinguish between human 
5 tissues/cells/organelles that do and do not express the polynucleotides comprising the cDNAs. By 
knowing the expression pattern of a given GENSET, either through routine experimentation or by using 
the instant disclosure, the polynucleotides and polypeptides of the present invention may be used in 
methods of determining the identity of an unknown tissue/cell sample/organelle. As part of determining 
the identity of an unknown tissue/cell sample/organelle, the polynucleotides and polypeptides of the 

10 present invention may be used to determine what the unknown tissue/cell sample is and what the 
unknown sample is not. For example, if a cDNA is expressed in a particular tissue/cell type/organelle, 
and the unknown tissue/cell sample/organelle does not express the cDNA, it may be inferred that the 
unknown tissue/cells are either not human or not the same human tissue/cell type/organelle as that which 
expresses the cDNA. These methods of determining tissue/cell/organelle identity are based on methods 

15 which detect the presence or absence of the mRNA (or corresponding cDNA) in a tissue/cell sample using 
methods well know in the art (e.g., hybridization, PCR based methods, immunoassays, immunochemistry, 
ELISA). Examples of such techniques are described in more detail below. Therefore, the invention 
encompasses uses of the polynucleotides and polypeptides of the invention as tissue markers. In a 
preferred embodiment, polynucleotides preferentially expressed in given tissues as indicated in Table X 

20 and polypeptides encoded by such polynucleotides are used for this purpose. The invention also 
encompasses uses of polypeptides of the invention as organelle markers. In a preferred embodiment, 
polypeptides preferentially expressed in given subcellular compartment as indicated in Table XI are used 
for this purpose. 

25 Consequently, the present invention encompasses methods of identification of a tissue/cell 

type/subcellular compartment, wherein said method includes the steps of: 

a) contacting a biological sample which identity is to be assayed with a product able to bind a 
GENSET product; and 

b) determining whether a GENSET product is expressed in said biological sample. 

30 Products that are able to bind specifically to a GENSET product, namely a GENSET polypeptide 

or a GENSET mRNA, include GENSET binding proteins, antibodies or binding fragments thereof (e.g. 
F(ab')2 fragments), as well as GENSET complementary probes and primers. 
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Step b) may be performed using any detection method known to those skilled in the art including 
those disclosed herein, especially in the section entitled "Detection of GENSET expression and/or 
biological activity".. 

Identification of Tissue Types or Cell Species by Means of Labeled Tissue Specific Antibodies 

5 Identification of specific tissues is accomplished by the visualization of tissue specific antigens by 

means of antibody preparations which are conjugated, directly (e.g., green fluorescent protein) or 
indirectly to a detectable marker. Selected labeled antibody species bind to their specific antigen binding 
partner in tissue sections, cell suspensions, or in extracts of soluble proteins from a tissue sample to 
provide a pattern for qualitative or semi-qualitative interpretation. 

10 Antisera for these procedures must have a potency exceeding that of the native preparation, and 

for that reason, antibodies are concentrated to a mg/ml level by isolation of the gamma globulin fraction, 
for example, by ion-exchange chromatography or by ammonium sulfate fractionation. Also, to provide 
the most specific antisera, unwanted antibodies, for example to common proteins, must be removed from 
the gamma globulin fraction, for example by means of insoluble immunoabsorbents, before the antibodies 

15 are labeled with the marker. Either monoclonal or heterologous antisera is suitable for either procedure. 

A. Immunohistochemical Techniques 

Purified, high-titer antibodies, prepared as described above, are conjugated to a detectable marker, 
as described, for example, by Fudenberg, (1980) or Rose et aL, (1980), which disclosures are hereby 
incorporated by reference in their entireties. 

20 A fluorescent marker, either fluorescein or rhodamine, is preferred, but antibodies can also be 

labeled with an enzyme that supports a color producing reaction with a substrate, such as horseradish 
peroxidase. Markers can be added to tissue-bound antibody in a second step, as described below. 
Alternatively, the specific anti-tissue antibodies can be labeled- with ferritin or other electron dense 
particles, and localization of the ferritin coupled antigen-antibody complexes achieved by means of an 

25 electron microscope. In yet another approach, the antibodies are radiolabeled, with, for example 125 I, and 
detected by overlaying the antibody treated preparation with photographic emulsion. Preparations to 
carry out the procedures can comprise monoclonal or polyclonal antibodies to a single protein or peptide 
identified as specific to a tissue type, for example, brain tissue, or antibody preparations to several 
antigenically distinct tissue specific antigens can be used in panels, independently or in mixtures, as 

30 required. Tissue sections and cell suspensions are prepared for immunohistochemical examination 
according to common histological techniques. Multiple cryostat sections (about 4 um, unfixed) of the 
unknown tissue and known control, are mounted and each slide covered with different dilutions of the 
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antibody preparation. Sections of known and unknown tissues should also be treated with preparations to 
provide a positive control, a negative control, for example, pre-immune sera, and a control for non- 
specific staining, for example, buffer. Treated sections are incubated in a humid chamber for 30 min at 
room temperature, rinsed, then washed in buffer for 30-45 min. Excess fluid is blotted away, and the 
5 marker developed. If the tissue specific antibody was not labeled in the first incubation, it can be labeled 
at this time in a second antibody-antibody reaction, for example, by adding fluorescein- or enzyme- 
conjugated antibody against the immunoglobulin class of the antiserum-producing species, for example, 
fluorescein labeled antibody to mouse IgG. Such labeled sera are commercially available. The antigen 
found in the tissues by the above procedure can be quantified by measuring the intensity of color or 
10 fluorescence on the tissue section, and calibrating that signal using appropriate standards. 

B. Identification of Tissue Specific Soluble Proteins 

The visualization of tissue specific proteins and identification of unknown tissues from that 
procedure is carried out using the labeled antibody reagents and detection strategy as described for 
immunohistochemistry; however the sample is prepared according to an electrophoretic technique to 

15 distribute the proteins extracted from the tissue in an orderly array on the basis of molecular weight for 
detection. A tissue sample is homogenized using a Virtis apparatus; cell suspensions are disrupted by 
Dounce homogenization or osmotic lysis, using detergents in either case as required to disrupt cell 
membranes, as is the practice in the art. Insoluble cell components such as nuclei, microsomes, and 
membrane fragments are removed by ultracentrifugation, and the soluble protein-containing fraction 

20 concentrated if necessary and reserved for analysis. A sample of the soluble protein solution is resolved 
into individual protein species by conventional SDS polyacrylamide electrophoresis as described, for 
example, by Davis et al 9 Section 19-2 (1986), using a range of amounts of polyacrylamide in a set of gels 
• to resolve the entire molecular weight range of proteins to be detected in the sample. A size marker is run 
in parallel for purposes of estimating molecular weights of the constituent proteins. Sample size for 

25 analysis is a convenient volume of from 5 to55 ul, and containing from about 1 to 100 ug protein. An 
aliquot of each of the resolved proteins is transferred by blotting to a nitrocellulose filter paper, a process 
that maintains the pattern of resolution. Multiple copies are prepared. The procedure, known as Western 
Blot Analysis, is well described in Davis et aL, (1986) Section 19-3. One set of nitrocellulose blots is 
stained with Coomassie Blue dye to visualize the entire set of proteins for comparison with the antibody 

30 bound proteins. The remaining nitrocellulose filters are then incubated with a solution of one or more 
specific antisera to tissue specific proteins prepared as described herein. In this procedure, as in procedure 
A above, appropriate positive and negative sample and reagent controls are run. 
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In either procedure A or B, a detectable label can be attached to the primary tissue antigen- 
primary antibody complex according to various strategies and permutations thereof. In a straightforward 
approach, the primary specific antibody can be labeled; alternatively, the unlabeled complex can be bound 
by a labeled secondary anti-IgG antibody. In other approaches, either the primary or secondary antibody 
5 is conjugated to a biotin molecule, which can, in a subsequent step, bind an avidin conjugated marker. 
According to yet another strategy, enzyme labeled or radioactive protein A, which has the property of 
binding to any IgG, is bound in a final step to either the primary or secondary antibody. The visualization 
of tissue specific antigen binding at levels above those seen in control tissues to one or more tissue 
specific antibodies, prepared from the gene sequences identified from cDNA sequences, can identify 
10 tissues of unknown origin, for example, forensic samples, or differentiated tumor tissue that has 
metastasized to foreign bodily sites. 

Targeting of compounds to subcellular compartments 

GENSET Polypeptides expressed in specific cellular compartments/organelels may also be used 
to target compounds to these compartment s/organelles. The invention therefore encompasses uses of 

15 polypeptides and polynucleotides of the invention as organelle targeting tools. 

In a first embodiment, GENSET polypeptides expressed in mitochondria may be used to target 
heterologous compounds, either polypeptides or polynucleotides to mitochondria by recombinantly or 
chemically fusing a fragment of the protein of the invention to an heterologous polypeptide or 
polynucleotide. Preferred fragments are signal peptide, amphiphilic alpha helices and/or any other 

20 fragments of the protein of the invention, or part thereof, that may contain targeting signals for 
mitochondria including but not limited to matrix targeting signals as defined in Herrman and Neupert, 
(2000); Bhagwat et al (1999), Murphy (1997); Glaser et al (1998); Ciminale et al (1999), which 
disclosures are hereby incorporated by reference in their entireties. Such heterologous compounds may 
be used to modulate mitochondria's activities. For example, they may be used to induce and/or prevent 

25 mitochondrial-induced apoptosis or necrosis. In addition, heterologous polynucleotides may be used for 
mitochondrial gene therapy to replace a defective mitochondrial gene and/or to inhibit the deleterious 
expression of a mitochondrial gene. 

In a second embodiment, GENSET polypeptides expressed in the endoplasmic reticulum may be 
used to target heterologous polypeptides to the endoplasmic reticulum by recombinantly or chemically fusing 

30 a fragment of the proteins of the invention to an heterologous polypeptide. Preferred fragments are any 
fragments of the proteins of the invention, or part thereof, that may contain targeting signals for the 
endoplasmic reticulum such as those described in Pidoux and Armstrong (1992), Munro and Pelham (1987); 
Pelham (1990), which disclosures are hereby incorporated by reference in their entireties. 
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In a third embodiment, GENSET polypeptides expressed in the nucleus may be used to target 
heterologous polypeptides or polynucleotides to the nucleus by recombinantly or chemically fusing a 
fragment of the proteins or polynuleotide of the invention to an heterologous polypeptide or polynucleotide. 
Preferred fragments are any fragments of the proteins or polynuclotide of the invention, or part thereof, that 
5 may contain targeting signals for the nucleus (nuclear localization signals) such as those described in 
Christophe et al. ( 2000), which disclosure is hereby incorporated by reference in its entirety. 

In a fourth embodiment, GENSET polypeptides expressed in the nucleus may be used to target 
heterologous polypeptides to the Golgi apparatus by recombinantly or chemically fusing a fragment of the 
protein of the invention to an heterologous polypeptide. Preferred fragments are signal peptide, 

10 transmembrane domains, tyrosine containing regions and/or any other fragments of the proteins of the 
invention, or part thereof, that may contain (1) targeting signals for the Golgi apparatus such as the ones 
described in Ugur and Jones, (2000); Picetti and Borrelli, (2000), (2) tyrosine-based Golgi targeting signal 
region (Zhan et al., (1998); Watson and Pessin (2000); Ward and Moss (2000), or (3) any other region as 
defined in Munro, (1998); Luetterforst et al., (1999); Essl et al., (1999), which disclosures are hereby 

15 incorporated by reference in their entireties. 

Screening and diagnosis of abnormal GENSET expression and/or biological activity 

Moreover, antibodies and/or primers specific for GENSET expression may also be used to 
identify abnormal GENSET expression and/or biological activity, and subsequently to screen and/or 
diagnose disorders associated with abnormal GENSET expression. For example, a particular disease may 

20 result from lack of expression, over expression, or under expression of a GENSET mRNA. By 
comparing mRNA expression patterns and quantities in samples taken from healthy individuals with 
those from individuals suffering from a particular disorder, genes responsible for this disorder may be 
identified. Primers, probes and antibodies specific for this GENSET may then be used to elaborate kits of 
screening and diagnosis for a disorder in which the gene of interest is specifically expressed or in which 

25 its expression is specifically dysregulated, i.e. underexpressed or overexpressed. 

Screening for specific disorders 

The present invention also relates to methods of identifying individuals having elevated or 
reduced levels of GENSET, which individuals are likely to benefit from therapies to suppress or enhance 
GENSET expression, respectively. One example of such methods comprises the steps of: 
30 a) obtaining from a human or non-human mammal a biological sample; 
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b) detecting the presence in said sample of a GENSET product (mRNA or protein) using any 
method known to those skilled in the art including those described herein, especially at the section 
entitled "Detection of GENSET products"; 

c) comparing the amount of said GENSET product present in said sample with that of a control 
5 sample; and 

d) determing whether said human or non-human mammal has a reduced or elevated level of 
GENSET expression compared to the control sample. 

Such individuals with reduced or elevated levels of GENSET products may be predisposed to 
disorders associated with deregulation of GENSET gene expression and thus would be candidates for 

10 therapies. The identification of elevated levels of GENSET in a patient would be indicative of an 
individual that would benefit from treatment with agents that suppress GENSET expression or activity. 
The identification of low levels of GENSET in a patient would be indicative of an individual that would 
benefit from agents that induce GENSET expression or activity. 

Biological samples suitable for use in this method include biological fluids such as blood, lymph, 

15 saliva, sperm, maternal milk, and tissue samples (e.g. biopsies ) as well as cell cultures or cell extracts 
derived, for example, from tissue biopsies. The detection step of the present method can be performed 
using standard protocols for protein/mRNA detection. Examples of suitable protocols include Northern 
blot analysis, immunoassays (e.g. RIA, Western blots, immunohistochemical analyses), and PCR. 

Thus, the present invention further relates to methods of identifying individuals or non-human 

20 animals at increased risk for developing, or present state of having, certain diseases/disorders associated 
with GENSET abnormal expression or biological activity. One example of such methods comprises the 
steps of: 

a) obtaining from a human or non-human mammal a biological sample; 

b) detecting the presence or absence in said sample of a GENSET product (mRNA or protein); 

25 c) comparing the amount of said GENSET product present in said sample with that of a control 

sample; and 

d) determing whether said human or non-human mammal is at increased risk for developing, or 
present state of having, a diseases or disorder. 

In accordance with this method, the presence in the sample of altered levels of GENSET product 
30 indicates that the subject is predisposed to the above-indicated diseases/disorders. Biological samples 
suitable for use in this method include biological fluids such as blood, lymph, saliva, sperm, maternal 
milk, and tissue samples (e.g. biopsies. 

The diagnostic methodologies described herein are applicable to both humans and non-human 
mammals. 
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Detection of GENSET mutations 

The invention also encompasses methods to detect mutations in GENSET polynucleotides of the 
invention. Such methods may advantageously be used to detect mutations occurring in GENSET genes 
and preferably in their regulatory regions. When the mutation was proven to be associated with a disease, 
5 screening for such mutations may be used for screening and diagnosis purposes. 

In one embodiment of the oligonucleotide arrays of the invention, an oligonucleotide probe 
matrix may advantageously be used to detect mutations occurring in GENSET genes and preferably in 
their regulatory regions. For this particular purpose, probes are specifically designed to have a nucleotide 
sequence allowing their hybridization to the genes that carry known mutations (either by deletion, 
10 insertion or substitution of one or several nucleotides). By known mutations, it is meant, mutations on the 
GENSET genes that have been identified according, for example to the technique used by Huang et 
a/.(1996) or Samson et a/.(1996), which disclosures are hereby incorporated by reference in their 
entireties. 

Another technique that is used to detect mutations in GENSET genes is the use of a high-density 
15 DNA array. Each oligonucleotide probe constituting a unit element of the high density DNA array is 
designed to match a specific subsequence of a GENSET genomic DNA or cDNA. Thus, an array 
consisting of oligonucleotides complementary to subsequences of the target gene sequence is used to 
determine the identity of the target sequence with the wild gene sequence, measure its amount, and detect 
differences between the target sequence and the reference wild gene sequence of the GENSET gene. In 
20 one such design, termed 4L tiled array, is implemented a set of four probes (A, C, G, T), preferably 15- 
nucleotide oligomers. In each set of four probes, the perfect complement will hybridize more strongly 
than mismatched probes. Consequently, a nucleic acid target of length L is scanned for mutations with a 
tiled array containing 4L probes, the whole probe set containing all the possible mutations in the known 
wild reference sequence. The hybridization signals of the 15-mer probe set tiled array are perturbed by a 
25 single base change in the target sequence. As a consequence, there is a characteristic loss of signal or a 
"footprint" for the probes flanking a mutation position. This technique was described by Chee et al. in 
1996, which disclosure is hereby incorporated by reference in its entirety. 

Construction of DNA constructs with a GENSET expression pattern 

In addition, characterization of the spatial and temporal expression patterns and expression levels 
30 of GENSET mRNAs is also useful for constructing expression vectors capable of producing a desired 
level of gene product in a desired spatial or temporal manner, as discussed below. 
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DNA Construct That Enables Directing Temporal And Spatial GENSET Gene Expression In Recombinant 
Cell Hosts And In Transgenic Animals. 

In order to study the physiological and phenotypic consequences of a lack of synthesis of a 
GENSET protein, both at the cell level and at the multi cellular organism level, the invention also 
5 encompasses DNA constructs and recombinant vectors enabling a conditional expression of a specific 
allele of a GENSET genomic sequence or cDNA and also of a copy of this genomic sequence or cDNA 
harboring substitutions, deletions, or additions of one or more bases as regards to a nucleotide sequence 
selected from the group consisting of sequences of SEQ ID Nos 1-241 and sequences of clone inserts of 
the deposited clone pool, or a fragment thereof, these base substitutions, deletions or additions being 

10 located either in an exon, an intron or a regulatory sequence, but preferably in the 5 '-regulatory sequence 
or in an exon of the GENSET genomic sequence or within the GENSET cDNA. 

A first preferred DNA construct is based on the tetracycline resistance operon tet from E. coli 
transposon TnlO for controlling the GENSET gene expression, such as described by Gossen et al. (1992, 
1995) and Furth et al (1994), which disclosures are hereby incorporated by reference in their entireties. 

15 Such a DNA construct contains seven tet operator sequences from TnlO (tetop) that are fused to either a 
minimal promoter or a 5 '-regulatory sequence of the GENSET gene, said minimal promoter or said 
GENSET regulatory sequence being operably linked to a polynucleotide of interest that codes either for a 
sense or an antisense oligonucleotide or for a polypeptide, including a GENSET polypeptide or a peptide 
fragment thereof. This DNA construct is functional as a conditional expression system for the nucleotide 

20 sequence of interest when the same cell also comprises a nucleotide sequence coding for either the wild 
type (tTA) or the mutant (rTA) repressor fused o the activating domain of viral protein VP 16 of herpes 
simplex virus, placed under the control of a promoter, such as the HCMVEE1 enhancer/promoter or the 
MMTV-LTR. Indeed, a preferred DNA construct of the invention comprise both the polynucleotide 
containing the tet operator sequences and the polynucleotide containing a sequence coding for the tTA or 

25 the rTA repressor. In a specific embodiment, the conditional expression DNA construct contains the 
sequence encoding the mutant tetracycline repressor rTA, the expression of the polynucleotide of interest 
is silent in the absence of tetracycline and induced in its presence. 

DNA Constructs Allowing Homologous Recombination: Replacement Vectors 

A second preferred DNA construct will comprise, from 5'-end to 3'-end: (a) a first nucleotide 
30 sequence that is comprised in the GENSET genomic sequence; (b) a nucleotide sequence comprising a 
positive selection marker, such as the marker for neomycine resistance (neo); and (c) a second nucleotide 
sequence that is comprised in the GENSET genomic sequence, and is located on the genome downstream 
the first GENSET nucleotide sequence (a). 
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In a preferred embodiment, this DNA construct also comprises a negative selection marker 
located upstream the nucleotide sequence (a) or downstream the nucleotide sequence (c). Preferably, the 
negative selection marker comprises the thymidine kinase (tk) gene (Thomas et al., 1986), the 
hygromycine beta gene (Te Riele et al, 1990), the hprt gene ( Van der Lugt et al, 1991; Reid et al, 
5 1990) or the Diphteria toxin A fragment (Dt-A) gene (Nada et al, 1993; Yagi et al 1990), which 
disclosures are hereby incorporated by reference in their entireties. Preferably, the positive selection 
marker is located within a GENSET exon sequence so as to interrupt the sequence encoding a GENSET 
protein. These replacement vectors are described, for example, by Thomas et a/.(1986; 1987), Mansour et 
a/.(1988) and Koller et a/.(1992). 
10 The first and second nucleotide sequences (a) and (c) may be indifferently located within a 

GENSET regulatory sequence, an intronic sequence, an exon sequence or a sequence containing both 
regulatory and/or intronic and/or exon sequences. The size of the nucleotide sequences (a) and (c) ranges 
from 1 to 50 kb, preferably from 1 to 10 kb, more preferably from 2 to 6 kb and most preferably from 2 to 
4kb. 

1 5 DNA Constructs Allowing Homologous Recombination: Cre-LoxP System. 

These new DNA constructs make use of the site specific recombination system of the PI phage. 
The PI phage possesses a recombinase called Cre which interacts specifically with a 34 base pairs lox? 
site. The lox? site is composed of two palindromic sequences of 13 bp separated by a 8 bp conserved 
sequence (Hoess et al, 1986), which disclosure is hereby incorporated by reference in its entirety. The 

20 recombination by the Cre enzyme between two lox? sites having an identical orientation leads to the 
deletion of the DNA fragment. 

The Cre-/oxP system used in combination with a homologous recombination technique has been 
first described by Gu et al (1993, 1994), which disclosures are hereby incorporated by reference in their 
entireties. Briefly, a nucleotide sequence of interest to be inserted in a targeted location of the genome 

25 harbors at least two lox? sites in the same orientation and located at the respective ends of a nucleotide 
sequence to be excised from the recombinant genome. The excision event requires the presence of the 
recombinase (Cre) enzyme within the nucleus of the recombinant cell host. The recombinase enzyme 
may be brought at the desired time either by (a) incubating the recombinant cell hosts in a culture medium 
containing this enzyme, by injecting the Cre enzyme directly into the desired cell, such as described by 

30 Araki et a/.(1995), which disclosure is hereby incorporated by reference in its entirety, or by lipofection 
of the enzyme into the cells, such as described by Baubonis et a/.(1993), which disclosure is hereby 
incorporated by reference in its entirety; (b) transfecting the cell host with a vector comprising the Cre 
coding sequence operably linked to a promoter functional in the recombinant cell host, which promoter 
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being optionally inducible, said vector being introduced in the recombinant cell host, such as described by 
Gu et al (1993) and Sauer et al (1988), which disclosures are hereby incorporated by reference in their 
entireties; (c) introducing in the genome of the cell host a polynucleotide comprising the Cre coding 
sequence operably linked to a promoter functional in the recombinant cell host, which promoter is 
5 optionally inducible, and said polynucleotide being inserted in the genome of the cell host either by a 
random insertion event or an homologous recombination event, such as described by Gu et a/.(1994). 

In a specific embodiment, the vector containing the sequence to be inserted in the GENSET gene 
by homologous recombination is constructed in such a way that selectable markers are flanked by lox? 
sites of the same orientation, it is possible, by treatment by the Cre enzyme, to eliminate the selectable 

10 markers while leaving the GENSET sequences of interest that have been inserted by an homologous 
recombination event. Again, two selectable markers are needed: a positive selection marker to select for 
the recombination event and a negative selection marker to select for the homologous recombination 
event. Vectors and methods using the Cre-/axP system are described by Zou et al (1994), which 
disclosure is hereby incorporated by reference in its entirety. 

15 Thus, a third preferred DNA construct of the invention comprises, from 5' -end to 3' -end: (a) a 

first nucleotide sequence that is comprised in the GENSET genomic sequence; (b) a nucleotide sequence 
comprising a polynucleotide encoding a positive selection marker, said nucleotide sequence comprising 
additionally two sequences defining a site recognized by a recombinase, such as a lox? site, the two sites 
being placed in the same orientation; and (c) a second nucleotide sequence that is comprised in the 

20 GENSET genomic sequence, and is located on the genome downstream of the first GENSET nucleotide 
sequence (a). 

The sequences defining a site recognized by a recombinase, such as a lox? site, are preferably 
located within the nucleotide sequence (b) at suitable locations bordering the nucleotide sequence for 
which the conditional excision is sought. In one specific embodiment, two loxP sites are located at each 

25 side of the positive selection marker sequence, in order to allow its excision at a desired time after the 
occurrence of the homologous recombination event. 

In a preferred embodiment of a method using the third DNA construct described above, the 
excision of the polynucleotide fragment bordered by the two sites recognized by a recombinase, 
preferably two loxP sites, is performed at a desired time, due to the presence within the genome of the 

30 recombinant host cell of a sequence encoding the Cre enzyme operably linked to a promoter sequence, 
preferably an inducible promoter, more preferably a tissue-specific promoter sequence and most 
preferably a promoter sequence which is both inducible and tissue-specific, such as described by Gu et 
a/.(1994). 
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The presence of the Cre enzyme within the genome of the recombinant cell host may result from 
the breeding of two transgenic animals, the first transgenic animal bearing the GENSET-derived sequence 
of interest containing the lox? sites as described above and the second transgenic animal bearing the Cre 
coding sequence operably linked to a suitable promoter sequence, such as described by Gu et a/.(1994). 
5 Spatio-temporal control of the Cre enzyme expression may also be achieved with an adenovirus 

based vector that contains the Cre gene thus allowing infection of cells, or in vivo infection of organs, for 
delivery of the Cre enzyme, such as described by Anton and Graham (1995) and Kanegae et al (1995), 
which disclosures are hereby incorporated by reference in their entireties. 

The DNA constructs described above may be used to introduce a desired nucleotide sequence of 

10 the invention, preferably a GENSET genomic sequence or a GENSET cDNA sequence, and most 
preferably an altered copy of a GENSET genomic or cDNA sequence, within a predetermined location of 
the targeted genome, leading either to the generation of an altered copy of a targeted gene (knock-out 
homologous recombination) or to the replacement of a copy of the targeted gene by another copy 
sufficiently homologous to allow an homologous recombination event to occur (knock-in homologous 

15 recombination). 

Modifying GENSET expression and/or biological activity 

Modifying endogenous GENSET expression and/or biological activity is expressly contemplated 
by the present invention. 

Screening for compounds that modulate GENSET expression and/or biological activity 

20 The present invention further relates to compounds able to modulate GENSET expression and/or 

biological activity and methods to use these compounds. Such compounds may interact with the 
regulatory sequences of GENSET genes or they may interact with GENSET polypeptides directly or 
indirectly. 

Compounds Interacting With GENSET Regulatory Sequences 

25 The present invention also concerns a method for screening substances or molecules that are able 

to interact with the regulatory sequences of a GENSET gene, such as for example promoter or enhancer 
sequences in untranscribed regions of the genomic DNA, as determined using any techniques known to 
those skilled in the art including those described in the section entitled "Identification of Promoters in 
Cloned Upstream Sequences, or such as regulatory sequences located in untranslated regions of GENSET 

30 mRNA. 

Sequences within untranscribed or untranslated regions of polynucleotides of the invention may 
be identified by comparison to databases containing known regulatory sequence such as transcription start 
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sites, transcription factor binding sites, promoter sequences, enhancer sequences, 5'UTR and 3'UTR 
elements (Pesole et aL, 2000; http://igs-server.cnrs-mrs.fr/~gauthereAJTR/index.html). Alternatively, the 
regulatory sequences of interest may be identified through conventional mutagenesis or deletion analyses 
of reporter plasmids using, for instance, techniques described in the section entitled "Identification of 
5 Promoters in Cloned Upstream Sequences". 

Following the identification of potential GENSET regulatory sequences, proteins which interact 
with these regulatory sequences may be identified as described below. 

Gel retardation assays may be performed independently in order to screen candidate molecules 
that are able to interact with the regulatory sequences of the GENSET gene, such as described by Fried 

10 and Crothers (1981), Garner and Revzin (1981) and Dent and Latchman (1993), the teachings of these 
publications being herein incorporated by reference. These techniques are based on the principle 
according to which a DNA or mRNA fragment which is bound to a protein migrates slower than the same 
unbound DNA or mRNA fragment. Briefly, the target nucleotide sequence is labeled. Then the labeled 
target nucleotide sequence is brought into contact with either a total nuclear extract from cells containing 

15 regulation factors, or with different candidate molecules to be tested. The interaction between the target 
regulatory sequence of the GENSET gene and the candidate molecule or the regulation factor is detected 
after gel or capillary electrophoresis through a retardation in the migration. 

Nucleic acids encoding proteins which are able to interact with the promoter sequence of the 
GENSET gene, more particularly a nucleotide sequence selected from the group consisting of the 

20 polynucleotides of the 5' and 3 5 regulatory region or a fragment or variant thereof, may be identified by 
using a one-hybrid system, such as that described in the booklet enclosed in the Matchmaker One-Hybrid 
System kit from Clontech (Catalog Ref. n° K 1603-1), the technical teachings of which are herein 
incorporated by reference. Briefly, the target nucleotide sequence is cloned upstream of a selectable 
reporter sequence and the resulting polynucleotide construct is integrated in the yeast genome 

25 {Saccharomyces cerevisiae). Preferably, multiple copies of the target sequences are inserted into the 
reporter plasmid in tandem. The yeast cells containing the reporter sequence in their genome are then 
transformed with a library comprising fusion molecules between cDNAs encoding candidate proteins for 
binding onto the regulatory sequences of the GENSET gene and sequences encoding the activator domain 
of a yeast transcription factor such as GAL4. The recombinant yeast cells are plated in a culture broth for 

30 selecting cells expressing the reporter sequence. The recombinant yeast cells thus selected contain a 
fusion protein that is able to bind onto the target regulatory sequence of the GENSET gene. Then, the 
cDNAs encoding the fusion proteins are sequenced and may be cloned into expression or transcription 
vectors in vitro. The binding of the encoded polypeptides to the target regulatory sequences of the 
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GENSET gene may be confirmed by techniques familiar to the one skilled in the art, such as gel 
retardation assays or DNAse protection assays. 

Ligands interacting with GENSET polypeptides 

For the purpose of the present invention, a ligand means a molecule, such as a protein, a peptide, 
5 an antibody or any synthetic chemical compound capable of binding to a GENSET protein or one of its 
fragments or variants or to modulate the expression of the polynucleotide coding for GENSET or a 
fragment or variant thereof. 

In the ligand screening method according to the present invention, a biological sample or a 
defined molecule to be tested as a putative ligand of a GENSET protein is brought into contact with the 

10 corresponding purified GENSET protein, for example the corresponding purified recombinant GENSET 
protein produced by a recombinant cell host as described herein, in order to form a complex between this 
protein and the putative ligand molecule to be tested. 

As an illustrative example, to study the interaction of a GENSET protein, or a fragment 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 

15 preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a polypeptide selected from the group 
consisting of sequences of SEQ ID Nos: 242-482, mature polypeptides included in SEQ ID Nos: 242-272 
and 274-384 as well as full-length and mature polypeptides encoded by the clone inserts of the deposited 
clone pool, with drugs or small molecules, such as molecules generated through combinatorial chemistry 
approaches, the microdialysis coupled to HPLC method described by Wang et al. (1997) or the affinity 

20 capillary electrophoresis method described by Bush et al (1997), the disclosures of which are 
incorporated by reference, can be used. 

In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which interact 
with a GENSET protein, or a fragment comprising a contiguous span of at least 6 amino acids, preferably 
at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a 

25 polypeptide selected from the group consisting of sequences of SEQ ID Nos: 242-482, mature 
polypeptides included in SEQ ID Nos: 242-272 and 274-384, as well as full-length and mature 
polypeptides encoded by the clone inserts of the deposited clone pool may be identified using assays such 
as the following. The molecule to be tested for binding is labeled with a detectable label, such as a 
fluorescent , radioactive, or enzymatic tag and placed in contact with immobilized GENSET protein, or a 

30 fragment thereof under conditions which permit specific binding to occur. After removal of non- 
specifically bound molecules, bound molecules are detected using appropriate means. 

Various candidate substances or molecules can be assayed for interaction with a GENSET 
polypeptide. These substances or molecules include, without being limited to, natural or synthetic 
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organic compounds or molecules of biological origin such as polypeptides. When the candidate 
substance or molecule comprises a polypeptide, this polypeptide may be the resulting expression product 
of a phage clone belonging to a phage -based random peptide library, or alternatively the polypeptide may 
be the resulting expression product of a cDNA library cloned in a vector suitable for performing a two- 
5 hybrid screening assay. 

A. Candidate ligands obtained from random peptide libraries 

In a particular embodiment of the screening method, the putative ligand is the expression product 
of a DNA insert contained in a phage vector (Parmley and Smith, 1988). Specifically, random peptide 
phages libraries are used. The random DNA inserts encode for peptides of 8 to 20 amino acids in length 

10 (Oldenburg et al, 1992; Valadon et al., 1996; Lucas, 1994; Westerink, 1995; Felici et aL, 1991), which 
disclosures are hereby incorporated by reference in their entireties. According to this particular 
embodiment, the recombinant phages expressing a protein that binds to an immobilized GENSET protein 
is retained and the complex formed between the GENSET protein and the recombinant phage may be 
subsequently immunoprecipitated by a polyclonal or a monoclonal antibody directed against the GENSET 

15 protein. 

Once the ligand library in recombinant phages has been constructed, the phage population is 
brought into contact with the immobilized GENSET protein. Then the preparation of complexes is 
washed in order to remove the non-specifically bound recombinant phages. The phages that bind 
specifically to the GENSET protein are then eluted by a buffer (acid pH) or immunoprecipitated by the 

20 monoclonal antibody produced by the hybridoma anti-GENSET, and this phage population is 
subsequently amplified by an over-infection of bacteria (for example E. coli). The selection step may be 
repeated several times, preferably 2-4 times, in order to select the more specific recombinant phage 
clones. The last step comprises characterizing the peptide produced by the selected recombinant phage 
clones either by expression in infected bacteria and isolation, expressing the phage insert in another host- 

25 vector system, or sequencing the insert contained in the selected recombinant phages. 

B. Candidate ligands obtained by competition experiments. 

Alternatively, peptides, drugs or small molecules which bind to a GENSET protein or fragment 
thereof comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, 
more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a polypeptide selected from the 
30 group consisting of sequences of SEQ ID Nos: 242-482, mature polypeptides included in SEQ ID Nos: 
242-272 and 274-384, as well as full-length and mature polypeptides encoded by the clone inserts of the 
deposited clone pool, may be identified in competition experiments. In such assays, the GENSET 
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protein, or a fragment thereof, is immobilized to a surface, such as a plastic plate. Increasing amounts of 
the peptides, drugs or small molecules are placed in contact with the immobilized GENSET protein, or a 
fragment thereof, in the presence of a detectable labeled known GENSET protein ligand. For example, 
the GENSET ligand may be detectably labeled with a fluorescent, radioactive, or enzymatic tag. The 
5 ability of the test molecule to bind the GENSET protein, or a fragment thereof, is determined by 
measuring the amount of detectably labeled known ligand bound in the presence of the test molecule. A 
decrease in the amount of known ligand bound to the GENSET protein, or a fragment thereof, when the 
test molecule is present indicated that the test molecule is able to bind to the GENSET protein, or a 
fragment thereof. 

10 C Candidate ligands obtained by affinity chromatography. 

Proteins or other molecules interacting with a GENSET protein, or a fragment thereof comprising 
a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a polypeptide selected from the group consisting of 
sequences of SEQ ID Nos: 242-482, mature polypeptides included in SEQ ID Nos: 242-272 and 274-384, 

15 as well as full-length and mature polypeptides encoded by the clone inserts of the deposited clone pool, 
can also be found using affinity columns which contain the GENSET protein, or a fragment thereof. The 
GENSET protein, or a fragment thereof, may be attached to the column using conventional techniques 
including chemical coupling to a suitable column matrix such as agarose, Affi Gel® , or other matrices 
familiar to those of skill in art. In some embodiments of this method, the affinity column contains 

20 chimeric proteins in which the GENSET protein, or a fragment thereof, is fused to glutathion S 
transferase (GST). A mixture of cellular proteins or pool of expressed proteins as described above is 
applied to the affinity column. Proteins or other molecules interacting with the GENSET protein, or a 
fragment thereof, attached to the column can then be isolated and analyzed on 2-D electrophoresis gel as 
described in Ramunsen et al. (1997), the disclosure of which is incorporated by reference. Alternatively, 

25 the proteins retained on the affinity column can be purified by electrophoresis based methods and 
sequenced. The same method can be used to isolate antibodies, to screen phage display products, or to 
screen phage display human antibodies. 

D. Candidate ligands obtained by optical biosensor methods 

Proteins interacting with a GENSET protein, or a fragment comprising a contiguous span of at 
30 least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 
50, or 100 amino acids of a polypeptide selected from the group consisting of sequences of SEQ ED Nos: 
242-482, mature polypeptides included in SEQ ID Nos: 242-272 and 274-384, as well as full-length and 
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mature polypeptides encoded by the clone inserts of the deposited clone pool, can also be screened by 
using an Optical Biosensor as described in Edwards and Leatherbarrow (1997) and also in Szabo et ai 
(1995), the disclosures of which are incorporated by reference. This technique permits the detection of 
interactions between molecules in real time, without the need of labeled molecules. This technique is 
5 based on the surface plasmon resonance (SPR) phenomenon. Briefly, the candidate ligand molecule to be 
tested is attached to a surface (such as a carboxymethyl dextran matrix). A light beam is directed towards 
the side of the surface that does not contain the sample to be tested and is reflected by said surface. The 
SPR phenomenon causes a decrease in the intensity of the reflected light with a specific association of 
angle and wavelength. The binding of candidate ligand molecules cause a change in the refraction index 

10 on the surface, which change is detected as a change in the SPR signal. For screening of candidate ligand 
molecules or substances that are able to interact with the GENSET protein, or a fragment thereof, the 
GENSET protein, or a fragment thereof, is immobilized onto a surface. This surface comprises one side 
of a cell through which flows the candidate molecule to be assayed. The binding of the candidate 
molecule on the GENSET protein, or a fragment thereof, is detected as a change of the SPR signal. The 

1 5 candidate molecules tested may be proteins, peptides, carbohydrates, lipids, or small molecules generated 
by combinatorial chemistry. This technique may also be performed by immobilizing eukaryotic or 
prokaryotic cells or lipid vesicles exhibiting an endogenous or a recombinantly expressed GENSET 
protein at their surface. 

The main advantage of the method is that it allows the determination of the association rate 
20 between the GENSET protein and molecules interacting with the GENSET protein. It is thus possible to 
select specifically ligand molecules interacting with the GENSET protein, or a fragment thereof, through 
strong or conversely weak association constants. 

E. Candidate ligands obtained through a two-hybrid screening assay. 

The yeast two-hybrid system is designed to study protein-protein interactions in vivo (Fields and 
25 Song, 1989), which disclosure is hereby incorporated by reference in its entirety, and relies upon the 
fusion of a bait protein to the DNA binding domain of the yeast Gal4 protein. This technique is also 
described in the US Patent N° US 5,667,973 and the US Patent N° 5,283,173, the technical teachings of 
both patents being herein incorporated by reference. 

The general procedure of library screening by the two-hybrid assay may be performed as 
30 described by Harper et al. (1993) or as described by Cho et al. (1998) or also Fromont-Racine et al. 
(1997), which disclosures are hereby incorporated by reference in their entireties. 

The bait protein or polypeptide comprises, consists essentially of, or consists of a GENSET 
polypeptide or a fragment thereof comprising a contiguous span of at least 6 amino acids, preferably at 
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least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a 
polypeptide selected from the group consisting of sequences of SEQ ID Nos: 242-482, mature 
polypeptides included in SEQ ID Nos: 242-272 and 274-384, as well as full-length and mature 
polypeptides encoded by the clone inserts of the deposited clone pool. 
5 More precisely, the nucleotide sequence encoding the GENSET polypeptide or a fragment or 

variant thereof is fused to a polynucleotide encoding the DNA binding domain of the GAL4 protein, the 
fused nucleotide sequence being inserted in a suitable expression vector, for example pAS2 or pM3. 

Then, a human cDNA library is constructed in a specially designed vector, such that the human 
cDNA insert is fused to a nucleotide sequence in the vector that encodes the transcriptional domain of the 
10 GAL4 protein. Preferably, the vector used is the pACT vector. The polypeptides encoded by the 
nucleotide inserts of the human cDNA library are termed "pray" polypeptides. 

A third vector contains a detectable marker gene, such as beta galactosidase gene or CAT gene 
that is placed under the control of a regulation sequence that is responsive to the binding of a complete 
Gal4 protein containing both the transcriptional activation domain and the DNA binding domain. For 
1 5 example, the vector pG5EC may be used. 

Two different yeast strains are also used. As an illustrative but non limiting example the two 
different yeast strains may be the followings : 

- Y190, the phenotype of which is (MATa, Leu2-3, 112 ura3-12, trpl-901, his3-D200, ade2-101, 
gal4Dgall80D URA3 GAL-LacZ, LYS GAL-HIS3, cyh^; 
20 - Y187, the phenotype of which is (MATa g*14 gal80 his3 trpl-901 ade2-101 ura3-52 leu2-3, - 

1 12 URA3 GAL-lacZmet"), which is the opposite mating type of Y 190. 

Briefly, 20 |ig of pAS2/GENSET and 20 |ig of pACT-cDNA library are co-transformed into yeast 
strain Y190. The transformants are selected for growth on minimal media lacking histidine, leucine and 
tryptophan, but containing the histidine synthesis inhibitor 3-AT (50 mM). Positive colonies are screened 
25 for beta galactosidase by filter lift assay. The double positive colonies (His + , beta-gal + ) are then grown on 
plates lacking histidine, leucine, but containing tryptophan and cycloheximide (10 mg/ml) to select for 
loss of pAS2/GENSET plasmids but retention of pACT-cDNA library plasmids. The resulting Y190 
strains are mated with Y187 strains expressing GENSET or non-related control proteins; such as 
cyclophilin B, lamin, or SNF1, as Gal4 fusions as described by Harper et al. (1993) and by Bram et al. 
30 (1993), which disclosures are hereby incorporated by reference in their entireties, and screened for beta 
galactosidase by filter lift assay. Yeast clones that are beta gal- after mating with the control Gal4 fusions 
are considered false positives. 

In another embodiment of the two-hybrid method according to the invention, interaction between 
the GENSET or a fragment or variant thereof with cellular proteins may be assessed using the 
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Matchmaker Two Hybrid System 2 (Catalog No. K 1604-1, Clontech). As described in the manual 
accompanying the kit, the disclosure of which is incorporated herein by reference, nucleic acids encoding 
the GENSET protein or a portion thereof, are inserted into an expression vector such that they are in 
frame with DNA encoding the DNA binding domain of the yeast transcriptional activator GAL4. A 
5 desired cDNA, preferably human cDNA, is inserted into a second expression vector such that they are in 
frame with DNA encoding the activation domain of GAL4. The two expression plasmids are transformed 
into yeast and the yeast are plated on selection medium which selects for expression of selectable markers on 
each of the expression vectors as well as GAL4 dependent expression of the HIS3 gene. Transformants 
capable of growing on medium lacking histidine are screened for GAL4 dependent lacZ expression. Those 
10 cells which are positive in both the histidine selection and the lacZ assay contain interaction between 
GENSET and the protein or peptide encoded by the initially selected cDNA insert. 

Compounds Modulating GENSET biological activity 

Another method of screening for compounds that modulate GENSET gene expression and/or 
biological activity is by measuring the effects of test compounds on a given cellular property in a host 

15 cell, such as apoptosis, proliferation, differentiation, protein glycosylation, etc... using a variety of 
techniques known to those skilled in the art including those described herein. 

In one embodiment, the present invention relates to a method of identifying an agent which alters 
GENSET activity, wherein a nucleic acid construct comprising a nucleic acid which encodes a 
mammalian GENSET polypeptide is introduced into a host cell. The host cells produced are maintained 

20 under conditions appropriate for expression of the encoded mammalian GENSET polypeptides, whereby 
the nucleic acid is expressed. The host cells are then contacted with a compound to be assessed (an agent) 
and the given cellular property of the cells is detected in the presence of the compound to be assessed. 
Detection of a change in the given cellular property in the presence of the agent indicates that the agent 
alters GENSET activity. 

25 In a particular embodiment, the invention relates to a method of identifying an agent which is an 

activator of GENSET activity, wherein detection of a change of the given cellular property in the 
presence of the agent indicates that the agent activates GENSET activity. In another particular 
embodiment, the invention relates to a method of identifying an agent which is an inhibitor of GENSET 
activity, wherein detection of a change of the given cellular property in the presence of the agent indicates 

30 that the agent inhibits GENSET activity. 
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Methods of Screening for Compounds Modulating GENSET Expression and/or Activity 

The present invention also relates to methods of screening compounds for their ability to 
modulate (e.g. increase or inhibit) the activity or expression of GENSET. More specifically, the present 
invention relates to methods of testing compounds for their ability either to increase or to decrease 
5 expression or activity of GENSET. The assays are performed in vitro or in vivo. 

In vitro methods 

In vitro, cells expressing GENSET are incubated in the presence and absence of the test 
compound. By determining the level of GENSET expression in the presence of the test compound or the 
level of GENSET activity in the presence of the test compound, compounds can be identified that 

10 suppress or enhance GENSET expression or activity. Alternatively, constructs comprising a GENSET 
regulatory sequence operably linked to a reporter gene (e.g. luciferase, chloramphenicol acetyl 
transferase, LacZ, green fluorescent protein, etc.) can be introduced into host cells and the effect of the 
test compounds on expression of the reporter gene detected. Cells suitable for use in the foregoing assays 
include, but are not limited to, cells having the same origin as tissues or cell lines in which the 

15 polypeptide is known to be expressed using the data from Table K. 

Consequently, the present invention encompasses a method for screening molecules that 
modulate the expression of a GENSET gene, said screening method comprising the steps of: 

a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a nucleotide 
sequence encoding a GENSET protein or a variant or a fragment thereof, placed under the control of its 

20 own promoter; 

b) bringing into contact said cultivated cell with a molecule to be tested; 

c) quantifying the expression of said GENSET protein or a variant or a fragment thereof in the 
presence of said molecule. 

Using DNA recombination techniques well known by the one skill in the art, the GENSET 
25 protein encoding DNA sequence is inserted into an expression vector, downstream from its promoter 
sequence. As an illustrative example, the promoter sequence of the GENSET gene is contained in the 5' 
untranscribed region of the GENSET genomic DNA. 

The quantification of the expression of a GENSET protein may be realized either at the mRNA 
level (using for example Northen blots, RT-PCR, preferably quantitative RT-PCR with primers and 
30 probes specific for the GENSET mRNA of interest) or at the protein level (using polyclonal or 
monoclonal antibodies in immunoassays such as ELISA or RIA assays, Western blots, or 
immunochemistry). 
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The present invention also concerns a method for screening substances or molecules that are able 
to increase, or in contrast to decrease, the level of expression of a GENSET gene. Such a method may 
allow the one skilled in the art to select substances exerting a regulating effect on the expression level of a 
GENSET gene and which may be useful as active ingredients included in pharmaceutical compositions 
5 for treating patients suffering from disorders associated with abnormal levels of GENSET products. 

Thus, also part of the present invention is a method for screening a candidate molecule that 
modulates the expression of a GENSET gene, this method comprises the following steps: 

a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises a GENSET 5' regulatory region or a regulatory active fragment or variant thereof, operably 

10 linked to a polynucleotide encoding a detectable protein; 

b) obtaining a candidate molecule; and 

c) determining the ability of said candidate molecule to modulate the expression levels of said 
polynucleotide encoding the detectable protein. 

In a further embodiment, said nucleic acid comprising a GENSET 5' regulatory region or a 

15 regulatory active fragment or variant thereof, includes the 5'UTR region of a GENSET cDNA selected 
from the group comprising of the 5'UTRs of the sequences of SEQ ID Nos 1-241, sequences of clones 
inserts of the deposited clone pool, regulatory active fragments and variants thereof. In a more preferred 
embodiment of the above screening method, said nucleic acid includes a promoter sequence which is 
endogenous with respect to the GENSET 5'UTR sequence. In another more preferred embodiment of the 

20 above screening method, said nucleic acid includes a promoter sequence which is exogenous with respect 
to the GENSET 5'UTR sequence defined therein. 

Preferred polynucleotides encoding a detectable protein are polynucleotides encoding beta 
galactosidase, green fluorescent protein (GFP) and chloramphenicol acetyl transferase (CAT). 

The invention further relates to a method for the production of a pharmaceutical composition 

25 comprising a method of screening a candidate molecule that modulates the expression of a GENSET gene 
and furthermore mixing the identified molecule with a pharmaceutical^ acceptable carrier. 

The invention also pertains to kits for the screening of a candidate substance modulating the 
expression of a GENSET gene. Preferably, such kits comprise a recombinant vector that allows the 
expression of a GENSET 5' regulatory region or a regulatory active fragment or a variant thereof, 

30 operably linked to a polynucleotide encoding a detectable protein or a GENSET protein or a fragment or a 
variant thereof. More preferably, such kits include a recombinant vector that comprises a nucleic acid 
including the 5'UTR region of a GENSET cDNA selected from the group comprising the 5'UTRs of the 
sequences of SEQ ID Nos 1-241, sequences of clones inserts of the deposited clone pool, regulatory 
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active fragments and variants thereof, being operably linked to a polynucleotide encoding a detectable 
protein. 

For the design of suitable recombinant vectors useful for performing the screening methods 
described above, it will be referred to the section of the present specification wherein the preferred 
5 recombinant vectors of the invention are detailed. 

Another object of the present invention comprises methods and kits for the screening of candidate 
substances that interact with a GENSET polypeptide, fragments or variants thereof. By their capacity to 
bind covalently or non-covalently to a GENSET protein, fragments or variants thereof, these substances 
or molecules may be advantageously used both in vitro and in vivo. 
10 In vitro, said interacting molecules may be used as detection means in order to identify the 

presence of a GENSET protein in a sample, preferably a biological sample. 

A method for the screening of a candidate substance that interact with a GENSET polypeptide, 
fragments or variants thereof, said methods comprising the following steps: 

a) providing a polypeptide comprising, consisting essentially of, or consisting of a GENSET 
15 protein or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a polypeptide 
selected from the group consisting of sequences of SEQ ID Nos: 242-482, mature polypeptides included in SEQ ID 
Nos: 242-272 and 274-384 as well as full-length and mature polypeptides encoded by the clone inserts of the 
deposited clone pool; 
20 b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

d) detecting the complexes formed between said polypeptide and said candidate substance. 

The invention further relates to a method for the production of a pharmaceutical composition 
comprising a method for the screening of a candidate substance that interact with a GENSET polypeptide, 
25 fragments or variants thereof and furthermore mixing the identified substance with a pharmaceutically 
acceptable carrier. 

The invention further concerns a kit for the screening of a candidate substance interacting with 
the GENSET polypeptide, wherein said kit comprises: 

a) a polypeptide comprising, consisting essentially of, or consisting of a GENSET protein or a 
30 fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, 
more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a polypeptide selected from the 
group consisting of sequences of SEQ ID Nos: 242-482, mature polypeptides included in SEQ ID Nos: 242-272 
and 274-384 as well as full-length and mature polypeptides encoded by the clone inserts of the deposited clone pool; 
and 
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b) optionally means useful to detect the complex formed between said polypeptide or a variant 
thereof and the candidate substance. 

In a preferred embodiment of the kit described above, the detection means comprises a 
monoclonal or polyclonal antibody binding to said GENSET protein or fragment or variant thereof. 

5 In vivo methods 

Compounds that suppress or enhance GENSET expression can also be identified using in vivo 
screens. In these assays, the test compound is administered (e.g. IV, IP, IM, orally, or otherwise), to the 
animal, for example, at a variety of dose levels. The effect of the compound on GENSET expression is 
determined by comparing GENSET levels, for example in tissues known to express the gene of interest 

10 using, for example the data obtained in Table IX, and using Northern blots, immunoassays, PCR, etc., as 
described above. Suitable test animals include rodents (e.g., mice and rats), primates, mammals. 
Humanized mice can also be used as test animals, that is mice in which the endogenous mouse protein is 
ablated (knocked out) and the homologous human protein added back by standard transgenic approaches. 
Such mice express only the human form of a protein. Humanized mice expressing only the human 

15 GENSET can be used to study in vivo responses to potential agents regulating GENSET protein or 
mRNA levels. As an example, transgenic mice have been produced carrying the human apoE4 gene. They 
are then bred with a mouse line that lacks endogenous apoE, to produce an animal model carrying human 
proteins believed to be instrumental in development of Alzheimer's pathology. Such transgenic animals 
are useful for dissecting the biochemical and physiological steps of disease, and for development of 

20 therapies for disease intervention (Loring, et al 7 1996) (incorporated herein by reference in its entirety). 

Uses for compounds modulating GENSET expression and/or biological activity 

Using in vivo (or in vitro) systems, it may be possible to identify compounds that exert a tissue 
specific effect, for example, that increase GENSET expression or activity only in tissues of interest. 
Screening procedures such as those described above are also useful for identifying agents for their 
25 potential use in pharmacological intervention strategies. Agents that enhance GENSET expression or 
stimulate its activity may thus be used to treat disorders which require upregulated levels of GENSET 
gene expression and/or activity. Compounds that suppress GENSET expression or inhibit its activity can 
be used to treat disorders which require downregulated levels of GENSET gene expression and/or 
activity. 

30 Also encompassed by the present invention is an agent which interacts with GENSET directly or 

indirectly, and inhibits or enhances GENSET expression and/or function. In one embodiment, the agent is 
an inhibitor which interferes with GENSET directly (e.g., by binding GENSET) or indirectly (e.g., by 
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blocking the ability of GENSET to have a GENSET biological -activity). In a particular embodiment, an 
inhibitor of GENSET protein is an antibody specific for GENSET protein or a functional portion of 
GENSET; that is, the antibody binds a GENSET polypeptide. For example, the antibody can be specific 
for a polypeptide encoded by one of the amino acid sequences of human GENSET genes (SEQ ID Nos: 
5 242-482, mature polypeptides included in SEQ ID Nos: 242-272 and 274-384, full-length and mature 
polypeptides encoded by the clone inserts of the deposited clone pool), mammal GENSET or portions 
thereof Alternatively, the inhibitor can be an agent other than an antibody (e.g., small organic molecule, 
protein or peptide) which binds GENSET and blocks its activity. For example, the inhibitor can be an 
agent which mimics GENSET structurally, but lacks its function. Alternatively, it can be an agent which 

10 binds to or interacts with a molecule which GENSET normally binds with or interacts with, thus blocking 
GENSET from doing so and preventing it from exerting the effects it would normally exert. 

In another embodiment, the agent is an enhancer (activator) of GENSET which increases the 
activity of GENSET (increases the effect of a given amount or level of GENSET), increases the length of 
time it is effective (by preventing its degradation or otherwise prolonging the time during which it is 

1 5 active) or both either directly or indirectly. 

The GENSET sequences of the present invention can also be used to generate nonhuman gene 
knockout animals, such as mice, which lack a GENSET gene or transgenically overexpress GENSET. 
For example, such GENSET gene knockout mice can be generated and used to obtain further insight into 
the function of GENSET as well as assess the specificity of GENSET activators and inhibitors. Also, 

20 over expression of GENSET (e.g., human GENSET) in transgenic mice can be used as a means of 
creating a test system for GENSET activators and inhibitors (e.g., against human GENSET). In addition, 
the GENSET gene can be used to clone the GENSET promoter/enhancer in order to identify regulators of 
GENSET transcription. GENSET gene knockout animals include animals which completely or partially 
lack the GENSET gene and/or GENSET activity or function. Thus the present invention relates to a 

25 method of inhibiting (partially or completely) GENSET biological activty in a mammal (e.g., human) 
comprising administering to the mammal an effective amount of an inhibitor of GENSET. The invention 
also relates to a method of enhancing GENSET biological activity in a mammal comprising administering 
to the mammal an effective amount of an enhancer GENSET. 

Inhibiting GENSET expression 

30 Therapeutic compositions according to the present invention may comprise advantageously one 

or several GENSET oligonucleotide fragments as an antisense tool or a triple helix tool that inhibits the 
expression of the corresponding GENSET gene. 
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Antisense Approach 

In antisense approaches, nucleic acid sequences complementary to an mRNA are hybridized to 
the mRNA intracellular^, thereby blocking the expression of the protein encoded by the mRNA. The 
antisense nucleic acid molecules to be used in gene therapy may be either DNA or RNA sequences. 
5 Preferred methods using antisense polynucleotide according to the present invention are the procedures 
described by Sczakiel et a/.(1995), which disclosure is hereby incorporated by reference in its entirety. 

Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that are 
complementary to GENSET mRNA, more preferably to the 5 'end of the GENSET mRNA. In another 
embodiment, a combination of different antisense polynucleotides complementary to different parts of the 
10 desired targeted gene are used. 

Other preferred antisense polynucleotides according to the present invention are sequences 
complementary to either a sequence of GENSET mRNAs comprising the translation initiation codon 
ATG or a sequence of GENSET genomic DNA containing a splicing donor or acceptor site. 

Preferably, the antisense polynucleotides of the invention have a 3' polyadenylation signal that 
15 has been replaced with a self-cleaving ribozyme sequence, such that RNA polymerase II transcripts are 
produced without poly(A) at their 3' ends, these antisense polynucleotides being incapable of export from 
the nucleus, such as described by Liu et a/.(1994), which disclosure is hereby incorporated by reference in 
its entirety. In a preferred embodiment, these GENSET antisense polynucleotides also comprise, within 
the ribozyme cassette, a histone stem-loop structure to stabilize cleaved transcripts against 3'-5' 
20 exonucleolytic degradation, such as the structure described by Eckner et a/.(1991), which disclosure is 
hereby incorporated by reference in its entirety. 

The antisense nucleic acids should have a length and melting temperature sufficient to permit 
formation of an intracellular duplex having sufficient stability to inhibit the expression of the GENSET 
mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use in gene therapy are 
25 disclosed in Green et aL, (1986) and Izant and Weintraub, (1984), the disclosures of which are 
incorporated herein by reference. 

In some strategies, antisense molecules are obtained by reversing the orientation of the GENSET 
coding region with respect to a promoter so as to transcribe the opposite strand from that which is 
normally transcribed in the cell. The antisense molecules may be transcribed using in vitro transcription 
30 systems such as those which employ T7 or SP6 polymerase to generate the transcript. Another approach 
involves transcription of GENSET antisense nucleic acids in vivo by operably linking DNA containing 
the antisense sequence to a promoter in a suitable expression vector. 

Alternatively, oligonucleotides which are complementary to the strand normally transcribed in the 
cell may be synthesized in vitro. Thus, the antisense nucleic acids are complementary to the 
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corresponding mRNA and are capable of hybridizing to the mRNA to create a duplex. In some 
embodiments, the antisense sequences may contain modified sugar phosphate backbones to increase 
stability and make them less sensitive to RNase activity. Examples of modifications suitable for use in 
antisense strategies include 2' O-methyl RNA oligonucleotides and Protein-nucleic acid (PNA) 
5 oligonucleotides. Further examples are described by Rossi et al., (1991), which disclosure is hereby 
incorporated by reference in its entirety. 

Various types of antisense oligonucleotides complementary to the sequence of the GENSET 
cDNA or genomic DNA may be used. In one preferred embodiment, stable and semi-stable antisense 
oligonucleotides described in International Application No. PCT WO94/23026, hereby incorporated by 
10 reference, are used. In these molecules, the 3' end or both the 3' and 5' ends are engaged in 
intramolecular hydrogen bonding between complementary base pairs. These molecules are better able to 
withstand exonuclease attacks and exhibit increased stability compared to conventional antisense 
oligonucleotides. 

In another preferred embodiment, the antisense oligodeoxynucleotides against herpes simplex 
15 virus types 1 and 2 described in International Application No. WO 95/04141. hereby incorporated by 
reference, are used. 

In yet another preferred embodiment, the covalently cross-linked antisense oligonucleotides 
described in International Application No. WO 96/31523, hereby incorporated by reference, are used. 
These double- or single-stranded oligonucleotides comprise one or more, respectively, inter- or intra- 

20 oligonucleotide covalent cross-linkages, wherein the linkage consists of an amide bond between a primary 
amine group of one strand and a carboxyl group of the other strand or of the same strand, respectively, the 
primary amine group being directly substituted in the T position of the strand nucleotide monosaccharide 
ring, and the carboxyl group being carried by an aliphatic spacer group substituted on a nucleotide or 
nucleotide analog of the other strand or the same strand, respectively. 

25 The antisense oligodeoxynucleotides and oligonucleotides disclosed in International Application 

No. WO 92/18522, incorporated by reference, may also be used. These molecules are stable to 
degradation and contain at least one transcription control recognition sequence which binds to control 
proteins and are effective as decoys therefor. These molecules may contain "hairpin" structures, 
"dumbbell" structures, "modified dumbbell" structures, "cross-linked" decoy structures and "loop" 

30 structures. 

In another preferred embodiment, the cyclic double-stranded oligonucleotides described in 
European Patent Application No. 0 572 287 A2, hereby incorporated by reference are used. These ligated 
oligonucleotide "dumbbells" contain the binding site for a transcription factor and inhibit expression of 
the gene under control of the transcription factor by sequestering the factor. 
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Use of the closed antisense oligonucleotides disclosed in International Application No. WO 
92/19732, hereby incorporated by reference, is also contemplated. Because these molecules have no free 
ends, they are more resistant to degradation by exonucleases than are conventional oligonucleotides. 
These oligonucleotides may be multifunctional, interacting with several regions which are not adjacent to 
5 the target mRNA. 

The appropriate level of antisense nucleic acids required to inhibit gene expression may be 
determined using in vitro expression analysis. The antisense molecule may be introduced into the cells by 
diffusion, injection, infection or transfection using procedures known in the art. For example, the 
antisense nucleic acids can be introduced into the body as a bare or naked oligonucleotide, 
10 oligonucleotide encapsulated in lipid, oligonucleotide sequence encapsidated by viral protein, or as an 
oligonucleotide operably linked to a promoter contained in an expression vector. The expression vector 
may be any of a variety of expression vectors known in the art, including retroviral or viral vectors, 
vectors capable of extrachromosomal replication, or integrating vectors. The vectors may be DNA or 
RNA. 

1 5 The antisense molecules are introduced onto cell samples at a number of different concentrations 

preferably between lxlO~ IO M to 1x10" 4 1VL Once the minimum concentration that can adequately control 
gene expression is identified, the optimized dose is translated into a dosage suitable for use in vivo. For 
example, an inhibiting concentration in culture of lxl 0" 7 translates into a dose of approximately 0.6 
mg/kg bodyweight. Levels of oligonucleotide approaching 100 mg/kg body weight or higher may be 

20 possible after testing the toxicity of the oligonucleotide in laboratory animals. It is additionally 
contemplated that cells from the vertebrate are removed, treated with the antisense oligonucleotide, and 
reintroduced into the vertebrate. 

In a preferred application of this invention, the polypeptide encoded by the gene is first identified, 
so that the effectiveness of antisense inhibition on translation can be monitored using techniques that 

25 include but are not limited to antibody-mediated tests such as RIAs and ELISA, functional assays, or 
radiolabeling. 

An alternative to the antisense technology that is used according to the present invention 
comprises using ribozymes that will bind to a target sequence via their complementary polynucleotide tail 
and that will cleave the corresponding RNA by hydrolyzing its target site (namely "hammerhead 
30 ribozymes"). Briefly, the simplified cycle of a hammerhead ribozyme comprises (1) sequence specific 
binding to the target RNA via complementary antisense sequences; (2) site -specific hydrolysis of the 
cleavable motif of the target strand; and (3) release of cleavage products, which gives rise to another 
catalytic cycle. Indeed, the use of long-chain antisense polynucleotide (at least 30 bases long) or 
ribozymes with long antisense arms are advantageous. A preferred delivery system for antisense 
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ribozyme is achieved by covalently linking these antisense ribozymes to lipophilic groups or to use 
liposomes as a convenient vector. Preferred antisense ribozymes according to the present invention are 
prepared as described by Rossi et al 9 (1991) and Sczakiel et tf/.(1995), the specific preparation procedures 
being referred to in said articles being herein incorporated by reference. 

5 Triple Helix Approach 

The GENSET genomic DNA may also be used to inhibit the expression of the GENSET gene 
based on intracellular triple helix formation. 

Triple helix oligonucleotides are used to inhibit transcription from a genome. They are 
particularly useful for studying alterations in cell activity when it is associated with a particular gene. The 

10 GENSET cDNAs or genomic DNAs of the present invention or, more preferably, a fragment of those 
sequences, can be used to inhibit gene expression in individuals having diseases associated with 
expression of a particular gene. Similarly, a portion of the GENSET genomic DNA can be used to study 
the effect of inhibiting GENSET transcription within a cell. Traditionally, homopurine sequences were 
considered the most useful for triple helix strategies. However, homopyrimidine sequences can also 

15 inhibit gene expression. Such homopyrimidine oligonucleotides bind to the major groove at 
homopurine:homopyrimidine sequences. Thus, both types of sequences from the GENSET genomic 
DNA are contemplated within the scope of this invention. 

To carry out gene therapy strategies using the triple helix approach, the sequences of the 
GENSET genomic DNA are first scanned to identify 10-mer to 20-mer homopyrimidine or homopurine 

20 stretches which could be used in triple-helix based strategies for inhibiting GENSET expression. 
Following identification of candidate homopyrimidine or homopurine stretches, their efficiency in 
inhibiting GENSET expression is assessed by introducing varying amounts of oligonucleotides containing 
the candidate sequences into tissue culture cells which express the GENSET gene. 

The oligonucleotides can be introduced into the cells using a variety of methods known to those 

25 skilled in the art, including but not limited to calcium phosphate precipitation, DEAE-Dextran, 
electroporation, liposome-mediated transfection or native uptake. 

Treated cells are monitored for altered cell function or reduced GENSET expression using 
techniques such as Northern blotting, RNase protection assays, or PCR based strategies to monitor the 
transcription levels of the GENSET gene in cells which have been treated with the oligonucleotide. The 

30 cell functions to be monitored are predicted based upon the homologies of the target gene corresponding 
to the cDNA from which the oligonucleotide was derived with known gene sequences that have been 
associated with a particular function. The cell functions can also be predicted based on the presence of 
abnormal physiology within cells derived from individuals with a particular inherited disease, particularly 
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when the cDNA is associated with the disease using techniques described in the section entitled 
"Identification of genes associated with hereditary diseases or drug response". 

The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells may 
then be introduced in vivo using the techniques and at a dosage calculated based on the in vitro results, as 
5 described in the section entitled "Antisense Approach". 

In some embodiments, the natural (beta) anomers of the oligonucleotide units can be replaced 
with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an intercalating 
agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha oligonucleotide to 
stabilize the triple helix. For information on the generation of oligonucleotides suitable for triple helix 
10 formation see Griffin et <z/.(1989), which is hereby incorporated by this reference. 

Treating GENSET-related disorders 

The present invention further relates to methods of treating diseases/disorders by increasing 
GENSET activity and/or expression. The invention also relates to methods of treating diseases/disorders 
by decreasing GENSET activity and or expression. These methodologies can be effected using 

15 compounds selected using screening protocols such as those described herein and/or by using the gene 
therapy and antisense approaches described in the art and herein. Gene therapy can be used to effect 
targeted expression of GENSET. The GENSET coding sequence can be cloned into an appropriate 
expression vector and targeted to a particular cell type(s) to achieve efficient, high level expression. 
Introduction of the GENSET coding sequence into target cells can be achieved, for example, using 

20 particle mediated DNA delivery, (Haynes, 1996 and Maurer, 1999), direct injection of naked DNA, (Levy 
et ai, 1996; and Feigner, 1996), or viral vector mediated transport (Smith et ai 9 1996, Stone et al 9 2000; 
Wu and Atai, 2000), each of which disclosures are hereby incorporated by reference in their entireties . 
Tissue specific effects can be achieved, for example, in the case of virus mediated transport by using viral 
vectors that are tissue specific, or by the use of promoters that are tissue specific. 

25 Combinatorial approaches can also be used to ensure that the GENSET coding sequence is 

activated in the target tissue (Butt and Karathanasis, 1995; Miller and Whelan, 1997), which disclosures 
are hereby incorporated by reference in their entireties. Antisense oligonucleotides complementary to 
GENSET mRNA can be used to selectively diminish or ablate the expression of the protein, for example, 
at sites of inflammation. More specifically, antisense constructs or antisense oligonucleotides can be used 

30 to inhibit the production of GENSET in high expressing cells such as those cited in the third column of 
Table X. Antisense mRNA can be produced by transfecting into target cells an expression vector with the 
GENSET gene sequence, or portion thereof, oriented in an antisense direction relative to the direction of 
transcription. Appropriate vectors include viral vectors, including retroviral, adenoviral, and 
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adeno-associated viral vectors, as well as nonviral vectors. Tissue specific promoters can be used. 
Alternatively, antisense oligonucleotides can be introduced directly into target cells to achieve the same 
goal. (See also other delivery methodologies described herein in connection with gene therapy.). 
Oligonucleotides can be selected/designed to achieve a high level of specificity (Wagner et al. 9 1996), 
5 which disclosure is hereby incorporated by reference in its entirety. The therapeutic methodologies 
described herein are applicable to both human and non-human mammals (including cats and dogs). 

Pharmaceutical and physiologically acceptable compositions 

The present invention also relates to pharmaceutical or physiologically acceptable compositions 
comprising, as active agent, the polypeptides, nucleic acids or antibodies of the invention. The invention 

10 also relates to compositions comprising, as active agent, compounds selected using the above -described 
screening protocols. Such compositions include the active agent in combination with a pharmaceutical or 
physiologically acceptably acceptable carrier. In the case of naked DNA, the "carrier" may be gold 
particles. The amount of active agent in the composition can vary with the agent, the patient and the 
effect sought. Likewise, the dosing regimen can vary depending on the composition and the 

1 5 disease/disorder to be treated. 

Therefore, the invention related to methods for the production of pharmaceutical composition 
comprising a method for selecting an active agent, compound, substance or molecule using any of the 
screening method described herein and furthermore mixing the identified active agent, compound, 
substance or molecule with a pharmaceutically acceptable carrier. 

20 The pharmaceutical compositions utilized in this invention may be administered by any number 

of routes including, but not limited to, oral, intravenous, intramuscular, intra-arterial, intramedullary, 
intrathecal, intraventricular, transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, 
sublingual, or rectal means. In addition to the active ingredients, these pharmaceutical compositions may 
contain suitable pharmaceutically acceptable carriers comprising excipients and auxiliaries which 

25 facilitate processing of the active compounds into preparations which can be used pharmaceutically. 
Further details on techniques for formulation and administration may be found in the latest edition of 
Remington's Pharmaceutical Sciences (Maack PublishingCo. Easton, Pa). 

Pharmaceutical compositions for oral administration can be formulated using pharmaceutically 
acceptable carriers well known in the art in dosages suitable for oral administration. Such carriers enable 

30 the pharmaceutical compositions to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, 
slurries, suspensions, and the like, for ingestion by the patient. 

Pharmaceutical preparations for oral use can be obtained through a combination of active 
compounds with solid excipient, suiting mixture is optionally grinding, and processing the mixture of 
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granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores. Suitable excipients 
are carbohydrate or protein fillers, such as sugars, including lactose, sucrose, mannitol, or sorbitol; starch 
from corn, wheat, rice, potato, or other plants; cellulose, such as methyl cellulose, hydroxypropylmethyl- 
cellulose, or sodium carboxymethylcellulose; gums including arabic and tragacanth; and proteins such as 
5 gelatin and collagen. If desired, disintegrating or solubilizing agents may be added, such as the cross- 
linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such as sodium alginate. 

Dragee cores may be used in conjunction with suitable coatings, such as concentrated sugar 
solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene 
glycol, and/or titaniumdioxide, lacquer solutions, and suitable organic solvents or solvent mixtures. 
10 Dyestuffs or pigments may be added to the tablets or dragee coatings for product identification or to 
characterize the quantity of active compound, i.e., dosage. 

Pharmaceutical preparations which can be used orally include push-fit capsules made of gelatin, 
as well as soft, sealed capsules made of gelatin and a coating, such as glycerol or sorbitol. Push-fit 
capsules can contain active ingredients mixed with a filler or binders, such as lactose or starches, 
15 lubricants, such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, the active 
compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid, or 
liquidpolyethylene glycol with or without stabilizers. 

Pharmaceutical formulations suitable for parenteral administration may be formulated in aqueous 
solutions, preferably in physiologically compatible buffers such as Hanks solution, Ringer's solution, or 
20 physiologically buffered saline. Aqueous injection suspensions may contain substances which increase 
the viscosity of the suspension, such as sodium carboxymethylcellulose, sorbitol, or dextran. 
Additionally, suspensions of the active compounds may be prepared as appropriate oily injection 
suspensions. Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil, or synthetic 
fatty acid esters, such as ethyl oleate or triglycerides, or liposomes. Optionally, the suspension may also 
25 contain suitable stabilizers or agents which increase the solubility of the compounds to allow for the 
preparation of highly concentrated solutions. 

For topical or nasal administration, penetrants appropriate to the particular barrier to be 
permeated are used in the formulation. Such penetrants are generally known in the art. 

The pharmaceutical compositions of the present invention may be manufactured in a manner that 
30 is known in the art, e.g., by means of conventional mixing, dissolving, granulating, dragee-making, 
levigating, emulsifying, encapsulating, entrapping, or lyophilizing processes. 

The pharmaceutical composition may be provided as a salt and can be formed with many acids, 
including but not limited to, hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc. Salts tend 
to be more soluble in aqueous or other protonic solvents than are the corresponding free base forms. In 
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other cases, the preferred preparation may be a lyophilized powder which may contain any or all of the 
following: 1-50 mM histidine, 0.1%-2% sucrose, and 2-7% mannitol, at a pH range of 4.5 to 5.5, that is 
combined with buffer prior to use. 

After pharmaceutical compositions have been prepared, they can be placed in an appropriate 
5 container and labeled for treatment of an indicated condition. For administration of GENSET, such 
labeling would include amount, frequency, and method of administration. 

Pharmaceutical compositions suitable for use in the invention include compositions wherein the 
active ingredients are contained in an effective amount to achieve the intended purpose. The 
determination of an effective dose is well within the capability of those skilled in the art. 
10 For any compound, the therapeutically effective dose can be estimated initially either in cell 

culture assays, e.g., of neoplastic cells, or in animal models, usually mice, rabbits, dogs, or pigs. The 
animal model may also be used to determine the appropriate concentration range and route of 
administration. Such information can then be used to determine useful doses and routes for administration 
in humans. 

15 A therapeutically effective dose refers to that amount of active ingredient, for example GENSET 

or fragments thereof, antibodies of GENSET, agonists, antagonists or inhibitors of GENSET, which 
ameliorates the symptoms or condition. Therapeutic efficacy and toxicity may be determined by standard 
pharmaceutical procedures in cell cultures or experimental animals, e.g., ED50 (the dose therapeutically 
effective in 50% of the population) and LD50 (the dose lethal to 50% of the population). The dose ratio 

20 between therapeutic and toxic effects is the therapeutic index, and it can be expressed as the ratio, 
LD50/ED50. Pharmaceutical compositions which exhibit large therapeutic indices are preferred. The data 
obtained from cell culture assays and animal studies is used in formulating a range of dosage for human 
use. The dosage contained in such compositions is preferably within a range of circulating concentrations 
that include the ED50 with little or no toxicity. The dosage varies within this range depending upon the 

25 dosage form employed, sensitivity of the patient, and the route of administration. 

The exact dosage will be determined by the practitioner, in light of factors related to the subject 
that requires treatment. Dosage and administration are adjusted to provide sufficient levels of the active 
moiety or to maintain the desired effect. Factors which may be taken into account include the severity of 
the disease state, general health of the subject, age, weight, and gender of the subject, diet, time and 

30 frequency of administration, drug combination(s), reaction sensitivities, and tolerance/response to 
therapy. Long-acting pharmaceutical compositions maybe administered every 3 to 4 days, every week, or 
once every two weeks depending on half-life and clearance rate of the particular formulation. 

Normal dosage amounts may vary from 0.1 to 100,000 micrograms, up to a total dose of about 1 
g, depending upon the route of administration. Guidance as to particular dosages and methods of delivery 
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is provided in the literature and generally available to practitioners in the art. Those skilled in the art will 
employ different formulations for nucleotides than for proteins or their inhibitors. Similarly, delivery of 
polynucleotides or polypeptides will be specific to particular cells, conditions, locations, etc. 

Uses of GENSET sequences: computer-Related Embodiments 

5 As used herein the term " cDNA codes of SEP ID Nos: 1-241 " encompasses the nucleotide 

sequences of SEQ ID Nos: 1-241 and of clones inserts of the deposited clone pool, fragments thereof, 
nucleotide sequences homologous thereto, and sequences complementary to all of the preceding 
sequences. The fragments include fragments of SEQ ID Nos: 1-241 comprising at least 8, 10, 12, 15, 18, 
20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 consecutive nucleotides of SEQ 

10 ID Nos: 1-241. Preferably the fragments include signal sequences and coding sequences for mature 
polypeptides of SEQ ED Nos: 1-31 and 33-143, polynucleotides described in Tables Va and Table Vb, 
polynucleotides encoding polypeptides described in Table VI, polynucleotide described herein as 
encoding polypeptides having a biological activity, or fragments comprising at least 8, 10, 12, 15, 18, 20, 
25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 consecutive nucleotides of the 

15 signal sequences or coding sequences for mature polypeptides of SEQ ID Nos: 1-31 and 33-143, 
polynucleotides described in Tables Va and Table Vb, polynucleotides encoding polypeptides described 
in Table VI, and polynucleotide described herein as encoding polypeptides having a biological activity. 
Homologous sequences and fragments of SEQ ID Nos: 1-241 refer to a sequence having at least 99%, 
98%, 97%, 96%, 95%, 90%, 85%, 80%, or 75% identity to these sequences. Identity may be determined 

20 using any of the computer programs and parameters described herein, including BLAST2N with the 
default parameters or with any modified parameters. Homologous sequences also include RNA 
sequences in which uridines replace the thymines in the cDNA codes of SEQ ID Nos: 1-241. The 
homologous sequences may be obtained using any of the procedures described herein or may result from 
the correction of a sequencing error as described above. Preferably the homologous sequences and 

25 fragments of SEQ ID Nos: 1-241 include polynucleotides homologous to signal sequences and coding 
sequences for mature polypeptides of SEQ ID Nos: 1-31 and 33-143, polynucleotides described in Tables 
Va and Table Vb, polynucleotides encoding a polypeptide fragment described as a domain in Table VI, 
polynucleotide described herein as encoding polypeptides having a biological activity, or fragments 
comprising at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 

30 2000 consecutive nucleotides of the signal sequences and coding sequences for mature polypeptides of 
SEQ ID Nos: 1-31 and 33-143, polynucleotides described in Tables Va and Table Vb, polynucleotides 
described in Table VI, and polynucleotide described herein as encoding polypeptides having a biological 
activity. It will be appreciated that the cDNA codes of SEQ ID Nos: 1-241 can be represented in the 
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traditional single character format (See the inside back cover of Styer, 1995) or in any other format which 
records the identity of the nucleotides in a sequence. 

As used herein the term " polypeptide codes of SEQ ID Nos: 242-482 " encompasses the 
polypeptide sequences of SEQ ID Nos: 242-482, the signal peptides included in SEQ ID Nos: 242-272 
5 and 274-384, the mature polypeptides included in SEQ ID Nos: 242-272 and 274-384, the full-length, 
signal peptides and mature polypeptide sequences encoded by the clone inserts of the deposited clone 
pool, polypeptide sequences homologous thereto, or fragments of any of the preceding sequences. 
Homologous polypeptide sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%, 
95%, 90%, 85%, 80%, 75% identity to one of the polypeptide sequences of SEQ ID Nos: 242-482, the 

10 signal peptides included in SEQ ID Nos: 242-272 and 274-384, the mature polypeptides included in SEQ 
ID Nos: 242-272 and 274-384, the full-length, signal peptides and mature polypeptide sequences encoded 
by the clone inserts of the deposited clone pool. Identity may be determined using any of the computer 
programs and parameters described herein, including FASTA with the default parameters or with any 
modified parameters. The homologous sequences may be obtained using any of the procedures described 

15 herein or may result from the correction of a sequencing error as described above. The polypeptide 
fragments comprise at least 5, 6, 8, 10, 12, 15, 20, 25,30, 35, 40, 50, 60, 75, 100, 150, 200, 250, 300, 350, 
400, 450 or 500 consecutive amino acids of the polypeptides of SEQ ID Nos: 242-482. Preferably, the 
fragments include polypeptides encoded by the signal peptides included in SEQ ID Nos: 242-272 and 
274-384, mature polypeptides included in SEQ ID Nos: 242-272 and 274-384, polynucleotides described 

20 in Tables Va and in Table Vb, domains described in Table VI, epitopes described in Table VII, 
polypeptides described herein as having a biological activity, or fragments comprising at least 5, 10, 15, 
20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300 or 400 consecutive amino acids of the signal peptides 
included in SEQ ID Nos: 242-272 and 274-384, mature polypeptides included in SEQ ID Nos: 242-272 
and 274-384, the polypeptides encoded by the polynucleotides described in Tables Va and in Table Vb, 

25 domains of Table VI, epitopes of Table VII or of polypeptides described herein as having a biological 
activity. It will be appreciated that the polypeptide codes of the SEQ ID Nos: 242-482 can be represented 
in the traditional single character format or three letter format (See the inside back cover of Stryer, 1995) 
or in any other format which relates the identity of the polypeptides in a sequence. 

It will be appreciated by those skilled in the art that the nucleic acid codes of the invention and 

30 polypeptide codes of the invention can be stored, recorded, and manipulated on any medium which can be 
read and accessed by a computer. As used herein, the words "recorded" and "stored" refer to a process 
for storing information on a computer medium. A skilled artisan can readily adopt any of the presently 
known methods for recording information on a computer readable medium to generate manufactures 
comprising one or more of the nucleic acid codes of the invention, or one or more of the polypeptide 
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codes of the invention. Another aspect of the present invention is a computer readable medium having 
recorded thereon at least 2, 5, 10, 15, 20, 25, 30, 50, 75, 100, 150 or 200 nucleic acid codes of the 
invention. Another aspect of the present invention is a computer readable medium having recorded 
thereon at least 2, 5, 10, 15, 20, 25, 30, 50, 75, 100, 150 or 200 polypeptide codes of the invention. 
5 Computer readable media include magnetically readable media, optically readable media, 

electronically readable media and magnetic/optical media. For example, the computer readable media 
may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random 
Access Memory (RAM), or Read Only Memory (ROM) as well as other types of other media known to 
those skilled in the art. 

10 Embodiments of the present invention include systems, particularly computer systems which 

store and manipulate the sequence information described herein. One example of a computer system 100 
is illustrated in block diagram form in Figure 2. As used herein, "a computer system" refers to the 
hardware components, software components, and data storage components used to analyze the nucleotide 
sequences of the nucleic acid codes of the invention or the amino acid sequences of the polypeptide codes 

15 of the invention. In one embodiment, the computer system 100 is a Sun Enterprise 1000 server (Sun 
Microsystems, Palo Alto, CA). The computer system 100 preferably includes a processor for processing, 
accessing and manipulating the sequence data. The processor 105 can be any well-known type of central 
processing unit, such as the Pentium III from Intel Corporation, or similar processor from Sun, Motorola, 
Compaq or International Business Machines. 

20 Preferably, the computer system 100 is a general purpose system that comprises the processor 105 

and one or more internal data storage components 1 10 for storing data, and one or more data retrieving 
devices for retrieving the data stored on the data storage components. A skilled artisan can readily 
appreciate that any one of the currently available computer systems are suitable. 

In one particular embodiment, the computer system 100 includes a processor 105 connected to a 

25 bus which is connected to a main memory 115 (preferably implemented as RAM) and one or more 
internal data storage devices 1 10, such as a hard drive and/or other computer readable media having data 
recorded thereon. In some embodiments, the computer system 100 further includes one or more data 
retrieving device 1 18 for reading the data stored on the internal data storage devices 1 10. 

The data retrieving device 1 1 8 may represent, for example, a floppy disk drive, a compact disk 

30 drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device 110 is a 
removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. 
containing control logic and/or data recorded thereon. The computer system 100 may advantageously 
include or be programmed by appropriate software for reading the control logic and/or the data from the 
data storage component once inserted in the data retrieving device. 
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The computer system 100 includes a display 120 which is used to display output to a computer 
user. It should also be noted that the computer system 100 can be linked to other computer systems 125a- 
c in a network or wide area network to provide centralized access to the computer system 100. 

Software for accessing and processing the nucleotide sequences of the nucleic acid codes of the 
5 invention or the amino acid sequences of the polypeptide codes of the invention (such as search tools, 
compare tools, and modeling tools etc.) may reside in main memory 1 15 during execution. 

In some embodiments, the computer system 100 may further comprise a sequence comparer for 
comparing the above-described nucleic acid codes of the invention or the polypeptide codes of the 
invention stored on a computer readable medium to reference nucleotide or polypeptide sequences stored 
10 on a computer readable medium. A "sequence comparer" refers to one or more programs which are 
implemented on the computer system 100 to compare a nucleotide or polypeptide sequence with other 
nucleotide or polypeptide sequences and/or compounds including but not limited to peptides, 
peptidomimetics, and chemicals stored within the data storage means. For example, the sequence 
comparer may compare the nucleotide sequences of nucleic acid codes of the invention or the amino acid 
15 sequences of the polypeptide codes of the invention stored on a computer readable medium to reference 
sequences stored on a computer readable medium to identify homologies, motifs implicated in biological 
function, or structural motifs. The various sequence comparer programs identified elsewhere in this 
patent specification are particularly contemplated for use in this aspect of the invention. 

Figure 3 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 
20 nucleotide or protein sequence with a database of sequences in order to determine the homology levels 
between the new sequence and the sequences in the database. The database of sequences can be a private 
database stored within the computer system 100, or a public database such as GENBANK, PER OR 
SWISSPROT that is available through the Internet. 

The process 200 begins at a start state 201 and then moves to a state 202 wherein the new 
25 sequence to be compared is stored to a memory in a computer system 100. As discussed above, the 
memory could be any type of memory, including RAM or an internal storage device. 

The process 200 then moves to a state 204 wherein a database of sequences is opened for analysis 
and comparison. The process 200 then moves to a state 206 wherein the first sequence stored in the 
database is read into a memory on the computer. A comparison is then performed at a state 210 to 
30 determine if the first sequence is the same as the second sequence. It is important to note that this step is 
not limited to performing an exact comparison between the new sequence and the first sequence in the 
database. Well-known methods are known to those of skill in the art for comparing two nucleotide or 
protein sequences, even if they are not identical. For example, gaps can be introduced into one sequence 
in order to raise the homology level between the two tested sequences. The parameters that control 



458 



G-078US05DIV 



whether gaps or other features are introduced into a sequence during comparison are normally entered by 
the user of the computer system. 

Once a comparison of the two sequences has been performed at the state 210, a determination is 
made at a decision state 210 whether the two sequences are the same. Of course, the term "same" is not 
5 limited to sequences that are absolutely identical. Sequences that are within the homology parameters 
entered by the user will be marked as "same" in the process 200. 

If a determination is made that the two sequences are the same, the process 200 moves to a state 
214 wherein the name of the sequence from the database is displayed to the user. This state notifies the 
user that the sequence with the displayed name fulfills the homology constraints that were entered. Once 

10 the name of the stored sequence is displayed to the user, the process 200 moves to a decision state 218 
wherein a determination is made whether more sequences exist in the database. If no more sequences 
exist in the database, then the process 200 terminates at an end state 220. However, if more sequences do 
exist in the database, then the process 200 moves to a state 224 wherein a pointer is moved to the next 
sequence in the database so that it can be compared to the new sequence. In this manner, the new 

15 sequence is aligned and compared with every sequence in the database. 

It should be noted that if a determination had been made at the decision state 212 that the 
sequences were not homologous, then the process 200 would move immediately to the decision state 218 
in order to determine if any other sequences were available in the database for comparison. 

Accordingly, one aspect of the present invention is a computer system comprising a processor, a 

20 data storage device having stored thereon a nucleic acid code of the invention or a polypeptide code of the 
invention,. In some embodiments the computer system further comprises a data storage device having 
retrievably stored thereon reference nucleotide sequences or polypeptide sequences to be compared to the 
nucleic acid code of the invention or polypeptide code of the invention and a sequence comparer for 
conducting the comparison. For example, the sequence comparer may comprise a computer program 

25 which indicates polymorphisms. In other aspects of the computer system, the system further comprises an 
identifier which identifies features in said sequence. The sequence comparer may indicate a homology 
level between the sequences compared or identify motifs implicated in biological function and structural 
motifs in the nucleic acid code of the invention and polypeptide codes of the invention or it may identify 
structural motifs in sequences which are compared to these nucleic acid codes and polypeptide codes. In 

30 some embodiments, the data storage device may have stored thereon the sequences of at least 2, 5, 10, 15, 
20, 25, 30, 50, 75, 100, 150 or 200 of the nucleic acid codes of the invention or polypeptide codes of the 
invention. 

Another aspect of the present invention is a method for determining the level of homology 
between a nucleic acid code of the invention and a reference nucleotide sequence, comprising the steps of 
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reading the nucleic acid code and the reference nucleotide sequence through the use of a computer 
program which determines homology levels and determining homology between the nucleic acid code 
and the reference nucleotide sequence with the computer program. The computer program may be any of 
a number of computer programs for determining homology levels, including those specifically 
5 enumerated herein, including BLAST2N with the default parameters or with any modified parameters. 
The method may be implemented using the computer systems described above. The method may also be 
performed by reading 2, 5, 10, 15, 20, 25, 30, 50, 75, 100, 150 or 200 of the above described nucleic acid 
codes of the invention through the use of the computer program and determining homology between the 
nucleic acid codes and reference nucleotide sequences. 

10 Figure 4 is a flow diagram illustrating one embodiment of a process 250 in a computer for 

determining whether two sequences are homologous. The process 250 begins at a start state 252 and then 
moves to a state 254 wherein a first sequence to be compared is stored to a memory. The second 
sequence to be compared is then stored to a memory at a state 256. The process 250 then moves to a state 
260 wherein the first character in the first sequence is read and then to a state 262 wherein the first 

15 character of the second sequence is read. It should be understood that if the sequence is a nucleotide 
sequence, then the character would normally be either A, T, C, G or U. If the sequence is a protein 
sequence, then it should be in the single letter amino acid code so that the first and sequence sequences 
can be easily compared. 

A determination is then made at a decision state 264 whether the two characters are the same. If 
20 they are the same, then the process 250 moves to a state 268 wherein the next characters in the first and 
second sequences are read. A determination is then made whether the next characters are the same. If 
they are, then the process 250 continues this loop until two characters are not the same. If a determination 
is made that the next two characters are not the same, the process 250 moves to a decision state 274 to 
determine whether there are any more characters either sequence to read. 
25 If there aren't any more characters to read, then the process 250 moves to a state 276 wherein the 

level of homology between the first and second sequences is displayed to the user. The level of homology 
is determined by calculating the proportion of characters between the sequences that were the same out of 
the total number of sequences in the first sequence. Thus, if every character in a first 100 nucleotide 
sequence aligned with a every character in a second sequence, the homology level would be 100%. 
30 Alternatively, the computer program may be a computer program which compares the nucleotide 

sequences of the nucleic acid codes of the present invention, to reference nucleotide sequences in order to 
determine whether the nucleic acid code of the invention differs from a reference nucleic acid sequence at 
one or more positions. Optionally such a program records the length and identity of inserted, deleted or 
substituted nucleotides with respect to the sequence of either the reference polynucleotide or the nucleic 
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acid code of the invention. In one embodiment, the computer program may be a program which 
determines whether the nucleotide sequences of the nucleic acid codes of the invention contain one or 
more single nucleotide polymorphisms (SNP) with respect to a reference nucleotide sequence. These 
single nucleotide polymorphisms may each comprise a single base substitution, insertion, or deletion. 
5 Another embodiment of the present invention is a method for comparing a first sequence to a 

reference sequence wherein the first sequence is selected from the group consisting of a cDNA code of 
SEQID NOs. 1-297 and a polypeptide code of SEQ ID NOs. 298-594 comprising the steps of reading the 
first sequence and the reference sequence through use of a computer program which compares sequences 
and determining differences between the first sequence and the reference sequence with the computer 

10 program. In some aspects of this embodiment, said step of determining differences between the first 
sequence and the reference sequence comprises identifying polymorphisms. 

Another aspect of the present invention is a method for determining the level of homology 
between a polypeptide code of the invention and a reference polypeptide sequence, comprising the steps 
of reading the polypeptide code of the invention and the reference polypeptide sequence through use of a 

15 computer program which determines homology levels and determining homology between the 
polypeptide code and the reference polypeptide sequence using the computer program. 

Accordingly, another aspect of the present invention is a method for determining whether a 
nucleic acid code of the invention differs at one or more nucleotides from a reference nucleotide sequence 
comprising the steps of reading the nucleic acid code and the reference nucleotide sequence through use 

20 of a computer program which identifies differences between nucleic acid sequences and identifying 
differences between the nucleic acid code and the reference nucleotide sequence with the computer 
program. In some embodiments, the computer program is a program which identifies single nucleotide 
polymorphisms The method may be implemented by the computer systems described above and the 
method illustrated in Figure 4. The method may also be performed by reading at least 2, 5, 10, 15, 20, 25, 

25 30, 50, 75, 100, 150 or 200 of the nucleic acid codes of the invention and the reference nucleotide 
sequences through the use of the computer program and identifying differences between the nucleic acid 
codes and the reference nucleotide sequences with the computer program. 

Thus, another embodiment of the present invention is a method for comparing a first sequence to 
a reference sequence wherein the first sequence is selected from the group consisting of the nucleic acid 

30 codes of the present invention or the polypeptide codes of the present invention comprising the steps of 
reading the first sequence and the reference sequence through use of a computer program which compares 
sequences and determining differences between the first sequence and the reference sequence with the 
computer program. In some aspects of this embodiment, said step of determining differences between the 
first sequence and the reference sequence comprises identifying polymorphisms. 
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Another aspect of the present invention , is a method for determining the level of identity between 
a first sequence and a reference sequence, wherein the first sequence is selected from the group consisting 
of the nucleic acid codes of the present invention or the polypeptide codes of the present invention, 
comprising the steps of reading the first sequence and the reference sequence through the use of a 
5 computer program which determines identity levels and determining identity between the first sequence 
and the reference sequence with the computer program. 

In other embodiments the computer based system may further comprise an identifier for 
identifying features within the nucleotide sequences of the nucleic acid codes of the invention or the 
amino acid sequences of the polypeptide codes of the invention. An "identifier" refers to one or more 
10 programs which identifies certain features within the above-described nucleotide sequences of the nucleic 
acid codes of the invention or the amino acid sequences of the polypeptide codes of the invention. In one 
embodiment, the identifier may comprise a program which identifies an open reading frame in the cDNAs 
codes of the invention. 

Another embodiment of the present invention is a method for identifying a feature in a sequence. 

15 selected from the group consisting of the nucleic acid codes of the invention or the amino acid sequences 
of the polypeptide codes of the invention comprising the steps of reading the sequence through the use of 
a computer program which identifies features in sequences and identifying features in the sequence with 
said computer program. In one aspect of this embodiment, the computer program comprises a computer 
program which identifies open reading frames. In a further embodiment, the computer program 

20 comprises a program that identifies linear or structural motifs in a polypeptide sequence. 

Figure 5 is a flow diagram illustrating one embodiment of an identifier process 300 for detecting 
the presence of a feature in a sequence. The process 300 begins at a start state 302 and then moves to a 
state 304 wherein a first sequence that is to be checked for features is stored to a memory 115 in the 
computer system 100. The process 300 then moves to a state 306 wherein a database of sequence features 

25 is opened. Such a database would include a list of each feature's attributes along with the name of the 
feature. For example, a feature name could be "Initiation Codon" and the attribute would be "ATG". 
Another example would be the feature name "TAATAA Box" and the feature attribute would be 
"TAATAA". An example of such a database is produced by the University of Wisconsin Genetics 
Computer Group (www.gcg.com). 

30 Once the database of features is opened at the state 306, the process 300 moves to a state 308 

wherein the first feature is read from the database. A comparison of the attribute of the first feature with 
the first sequence is then made at a state 310. A determination is then made at a decision state 316 
whether the attribute of the feature was found in the first sequence. If the attribute was found, then the 
process 300 moves to a state 318 wherein the name of the found feature is displayed to the user. 



462 



G-078US05DIV 



The process 300 then moves to a decision state 320 wherein a determination is made whether 
move features exist in the database. If no more features do exist, then the process 300 terminates at an 
end state 324. However, if more features do exist in the database, then the process 300 reads the next 
sequence feature at a state 326 and loops back to the state 310 wherein the attribute of the next feature is 
5 compared against the first sequence. 

It should be noted, that if the feature attribute is not found in the first sequence at the decision 
state 316, the process 300 moves directly to the decision state 320 in order to determine if any more 
features exist in the database. 

In another embodiment, the identifier may comprise a molecular modeling program which 

10 determines the 3 -dimensional structure of the polypeptides codes of the invention. Such programs may 
use any methods known to those skilled in the art including methods based on homology-modeling, fold 
recognition and ab initio methods as described in Sternberg et ai, 1999, which disclosure is hereby 
incorporated by reference in its entirety. In some embodiments, the molecular modeling program 
identifies target sequences that are most compatible with profiles representing the structural environments 

15 of the residues in known three-dimensional protein structures. (See, e.g., Eisenberg et ai, U.S. Patent 
No. 5,436,850 issued July 25, 1995, which disclosure is hereby incorporated by reference in its entirety). 
In another technique, the known three-dimensional structures of proteins in a given family are 
superimposed to define the structurally conserved regions in that family. This protein modeling technique 
also uses the known three-dimensional structure of a homologous protein to approximate the structure of 

20 the polypeptide codes of the invention. (See e.g., Srinivasan, et al., U.S. Patent No. 5,557,535 issued 
September 17, 1996, which disclosure is hereby incorporated by reference in its entirety). Conventional 
homology modeling techniques have been used routinely to build models of proteases and antibodies. 
(Sowdhamini et al. 9 (1997)). Comparative approaches can also be used to develop three-dimensional 
protein models when the protein of interest has poor sequence identity to template proteins. In some 

25 cases, proteins fold into similar three-dimensional structures despite having very weak sequence 
identities. For example, the three-dimensional structures of a number of helical cytokines fold in similar 
three-dimensional topology in spite of weak sequence homology. 

The recent development of threading methods now enables the identification of likely folding 
patterns in a number of situations where the structural relatedness between target and template(s) is not 

30 detectable at the sequence level. Hybrid methods, in which fold recognition is performed using Multiple 
Sequence Threading (MST), structural equivalencies are deduced from the threading output using a 
distance geometry program DRAGON to construct a low resolution model, and a full-atom representation 
is constructed using a molecular modeling package such as QUANTA. 



463 



G-078US05DIV 



According to this 3 -step approach, candidate templates are first identified by using the novel fold 
recognition algorithm MST, which is capable of performing simultaneous threading of multiple aligned 
sequences onto one or more 3-D structures. In a second step, the structural equivalencies obtained from 
the MST output are converted into interresidue distance restraints and fed into the distance geometry 
5 program DRAGON, together with auxiliary information obtained from secondary structure predictions. 
The program combines the restraints in an unbiased manner and rapidly generates a large number of low 
resolution model confirmations. In a third step, these low resolution model confirmations are converted 
into full-atom models and subjected to energy minimization using the molecular modeling package 
QUANTA. (See e.g., Aszodi et al, (1997)). 

10 The results of the molecular modeling analysis may then be used in rational drug design 

techniques to identify agents which modulate the activity of the polypeptide codes of the invention. 

Accordingly, another aspect of the present invention is a method of identifying a feature within 
the nucleic acid codes of the invention or the polypeptide codes of the invention comprising reading the 
nucleic acid code(s) or the polypeptide code(s) through the use of a computer program which identifies 

15 features therein and identifying features within the nucleic acid code(s) or polypeptide code(s) with the 
computer program. In one embodiment, computer program comprises a computer program which 
identifies open reading frames. In a further embodiment, the computer program identifies linear or 
structural motifs in a polypeptide sequence. In another embodiment, the computer program comprises a 
molecular modeling program. The method may be performed by reading a single sequence or at least 2, 

20 5, 10, 15, 20, 25, 30, 50, 75, 100, 150 or 200 of the nucleic acid codes of the invention or the polypeptide 
codes of the invention through the use of the computer program and identifying features within the 
nucleic acid codes or polypeptide codes with the computer program. 

The nucleic acid codes of the invention or the polypeptide codes of the invention may be stored 
and manipulated in a variety of data processor programs in a variety of formats. For example, they may 

25 be stored as text in a word processing file, such as MicrosoftWORD or WORDPERFECT or as an ASCII 
file in a variety of database programs familiar to those of skill in the art, such as DB2, SYBASE, or 
ORACLE. In addition, many computer programs and databases may be used as sequence comparers, 
identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to the nucleic acid 
codes of the invention or the polypeptide codes of the invention. The following list is intended not to limit 

30 the invention but to provide guidance to programs and databases which are useful with the nucleic acid 
codes of the invention or the polypeptide codes of the invention. The programs and databases which may 
be used include, but are not limited to: MacPattern (EMBL), DiscoveryBase (Molecular Applications 
Group), GeneMine (Molecular Applications Group), Look (Molecular Applications Group), MacLook 
(Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and BLASTX (Altschul et al, 
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1990), FASTA (Pearson and Lipman, 1988), FASTDB (Brutlag et al., 1990), Catalyst (Molecular 
Simulations Inc.), Catalyst/SHAPE (Molecular Simulations Inc.), Cerius2.DBAccess (Molecular 
Simulations Inc.), HypoGen (Molecular Simulations Inc.), Insight II, (Molecular Simulations Inc.), 
Discover (Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix (Molecular 
5 Simulations Inc.), DelPhi, (Molecular Simulations Inc.), QuanteMM, (Molecular Simulations Inc.), 
Homology (Molecular Simulations Inc.), Modeler (Molecular Simulations Inc.), ISIS (Molecular 
Simulations Inc.), Quanta/Protein Design (Molecular Simulations Inc.), WebLab (Molecular Simulations 
Inc.), WebLab Diversity Explorer (Molecular Simulations Inc.), Gene Explorer (Molecular Simulations 
Inc.), SeqFold (Molecular Simulations Inc.), the EMBL/Swissprotein database, the MDL Available 

10 Chemicals Directory database, the MDL Drug Data Report data base, the Comprehensive Medicinal 
Chemistry database, Derwents's World Drug Index database, the BioByteMasterFile database, the 
Genbank database, and the Genseqn database. Many other programs and data bases would be apparent to 
one of skill in the art given the present disclosure. 

Motifs which may be detected using the above programs include sequences encoding leucine 

15 zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and beta sheets, . 
signal sequences encoding signal peptides which direct the secretion of the encoded proteins, sequences 
implicated in transcription regulation such as homeoboxes, acidic stretches, enzymatic active sites, 
substrate binding sites, and enzymatic cleavage sites. 

Conclusion 

20 As discussed above, the GENSET polynucleotides and polypeptides of the present invention or 

fragments thereof can be used for various purposes. The polynucleotides can be used to express 
recombinant protein for analysis, characterization or therapeutic use; as markers for tissues in which the 
corresponding protein is preferentially expressed (either constitutively or at a particular stage of tissue 
differentiation or development or in disease states); as molecular weight markers on Southern gels; as 

25 chromosome markers or tags (when labeled) to identify chromosomes or to map related gene positions; as 
a reagent (including a labeled reagent) in assays designed to quantitatively determine levels of GENSET 
expression in biological samples; to compare with endogenous DNA sequences in patients to identify 
potential genetic disorders; as probes to hybridize and thus discover novel, related DNA sequences; as a 
source of information to derive PCR primers for genetic fingerprinting; for selecting and making 

30 oligomers for attachment to a "gene chip" or other support, including for examination for expression 
patterns; to raise anti-protein antibodies using DNA immunization techniques; and as an antigen to raise 
anti-DNA antibodies or elicit another immune response. Where the polynucleotide encodes a protein 
which binds or potentially binds to another protein (such as, for example, in a receptor-ligand interaction), 
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the polynucleotide can also be used in interaction trap assays (such as, for example, that described in 
Gyuris et ai, (1993) to identify polynucleotides encoding the other protein with which binding occurs or 
to identify inhibitors of the binding interaction. 

The proteins or polypeptides provided by the present invention can similarly be used in assays to 
5 determine biological activity, including in a panel of multiple proteins for high-throughput screening; to 
raise antibodies or to elicit another immune response; as a reagent (including the labeled reagent) in 
assays designed to quantitatively determine levels of the protein (or its receptor) in biological fluids; as 
markers for tissues in which the corresponding protein is preferentially expressed (either constitutively or 
at a particular stage of tissue differentiation or development or in a disease state); and, of course, to isolate 
10 correlative receptors or ligands. Where the protein binds or potentially binds to another protein (such as, 
for example, in a receptor-ligand interaction), the protein can be used to identify the other protein with 
which binding occurs or to identify inhibitors of the binding interaction. Proteins involved in these 
binding interactions can also be used to screen for peptide or small molecule inhibitors or agonists of the 
binding interaction. 

15 Any or all of these research utilities are capable of being developed into reagent grade or kit 

format for commercialization as research products. 

Methods for performing the uses listed above are well known to those skilled in the art. 

References disclosing such methods include without limitation "Molecular Cloning; A Laboratory 

Manual", 2d ed., Cole Spring Harbor Laboratory Press, Sambrook, J., E.F. Fritsch and T. Maniatis eds., 
20 1989, and "Methods in Enzymology; Guide to Molecular Cloning Techniques", Academic Press, Berger 

and Kimmel eds., 1987, which disclosures are hereby incorporated by reference in their entireties. 

Polynucleotides and proteins of the present invention can also be used as nutritional sources or 

supplements. Such uses include without limitation use as a protein or amino acid supplement, use as a 

carbon source, use as a nitrogen source and use as a source of carbohydrate. In such cases the protein or 
25 polynucleotide of the invention can be added to the feed of a particular organism or can be administered 

as a separate solid or liquid preparation, such as in the form of powder, pills, solutions, suspensions or 

capsules. In the case of microorganisms, the protein or polynucleotide of the invention can be added to 

the medium in or on which the microorganism is cultured. 

Although this invention has been described in terms of certain preferred embodiments, other 
30 embodiments which will be apparent to those of ordinary skill in the art in view of the disclosure herein 

are also within the scope of this invention. Accordingly, the scope of the invention is intended to be 

defined only by reference to the appended claims. 
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Examples 

Preparation of Antibody Compositions to GENSET proteins 

Substantially pure protein or polypeptide is isolated from transfected or transformed cells 
containing an expression vector encoding a GENSET protein or a portion thereof. The concentration of 
5 protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the 
level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared as 
follows: 

A. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes in the GENSET protein or a portion thereof can be prepared 
10 from murine hybridomas according to the classical method of Kohler and Milstein, (1975) or derivative 
methods thereof. Also see Harlow and Lane. (1988).. 

Briefly, a mouse is repetitively inoculated with a few micrograms of the GENSET protein or a 
portion thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody producing 
cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse 
15 myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media 
comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the 
dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody- 
producing clones are identified by detection of antibody in the supernatant fluid of the wells by 
immunoassay procedures, such as ELISA, as originally described by Engvall, (1980), which disclosure is 
20 hereby incorporated by reference in its entirety, and derivative methods thereof. Selected positive clones 
can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for 
monoclonal antibody production are described in Davis, et ah (1986) Section 21-2. 

B. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogeneous epitopes in the GENSET protein or 
25 a portion thereof can be prepared by immunizing suitable non-human animal with the GENSET protein or 
a portion thereof, which can be unmodified or modified to enhance immunogenicity. A suitable non- 
human animal is preferably a non-human mammal is selected, usually a mouse, rat, rabbit, goat, or horse. 
Alternatively, a crude preparation which has been enriched for GENSET concentration can be used to 
generate antibodies. Such proteins, fragments or preparations are introduced into the non-human 
30 mammal in the presence of an appropriate adjuvant (e.g. aluminum hydroxide, RIBI, etc.) which is known 
in the art. In addition the protein, fragment or preparation can be pretreated with an agent which will 
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increase antigenicity, such agents are known in the art and include, for example, methylated bovine serum 
albumin (mBSA), bovine serum albumin (BSA), Hepatitis B surface antigen, and keyhole limpet 
hemocyanin (KLH). Serum from the immunized animal is collected, treated and tested according to 
known procedures. If the serum contains polyclonal antibodies to undesired epitopes, the polyclonal 
5 antibodies can be purified by immunoaffinity chromatography. 

Effective polyclonal antibody production is affected by many factors related both to the antigen 
and the host species. Also, host animals vary in response to site of inoculations and dose, with both 
inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen 
administered at multiple intradermal sites appears to be most reliable. Techniques for producing and 

10 processing polyclonal antisera are known in the art. An effective immunization protocol for rabbits can 
be found in Vaitukaitis et al. (1971), which disclosure is hereby incorporated by reference in its entirety. 

Booster injections can be given at regular intervals, and antiserum harvested when antibody titer 
thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against 
known concentrations of the antigen, begins to fall. See, for example, Ouchterlony et al., (1973), which 

15 disclosure is hereby incorporated by reference in its entirety. Plateau concentration of antibody is usually 
in the range of 0.1 to 0.2 mg/ml of serum (about 12 uM). Affinity of the antisera for the antigen is 
determined by preparing competitive binding curves, as described, for example, by Fisher (1980), which 
disclosure is hereby incorporated by reference in its entirety. 

Antibody preparations prepared according to either the monoclonal or the polyclonal protocol are 

20 useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in 
biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of 
antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing 
cells expressing the protein or reducing the levels of the protein in the body. 

Biological assays 

25 Assaying GENSET Secreted Proteins to Determine Whether they Bind to the Cell Surface 

The secreted proteins encoded by the GENSET cDNAs, preferably the proteins of SEQ ID NOs: 
242-272 and 274-384, or fragments thereof are cloned into expression vectors. The proteins are purified 
by size, charge, immunochromatography or other techniques familiar to those skilled in the art. 
Following purification, the proteins are labeled using techniques known to those skilled in the art. The 
30 labeled proteins are incubated with cells or cell lines derived from a variety of organs or tissues to allow 
the proteins to bind to any receptor present on the cell surface. Following the incubation, the cells are 
washed to remove non-specifically bound protein. The labeled proteins are detected by autoradiography. 
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Alternatively, unlabeled proteins may be incubated with the cells and detected with antibodies having a 
detectable label, such as a fluorescent molecule, attached thereto. 

Specificity of cell surface binding may be analyzed by conducting a competition analysis in 
which various amounts of unlabeled protein are incubated along with the labeled protein. The amount of 
5 labeled protein bound to the cell surface decreases as the amount of competitive unlabeled protein 
increases. As a control, various amounts of an unlabeled protein unrelated to the labeled protein is 
included in some binding reactions. The amount of labeled protein bound to the cell surface does not 
decrease in binding reactions containing increasing amounts of unrelated unlabeled protein, indicating 
that the protein encoded by the cDNA binds specifically to the cell surface. 
10 As discussed herein, secreted proteins have been shown to have a number of important 

physiological effects and, consequently, represent a valuable therapeutic resource. The secreted proteins 
encoded by the cDNAs or fragments thereof made using any of the methods described therein may be 
evaluated to determine their physiological activities as described below. 

Assaying GENSET proteins or Fragments Thereof for Cytokine, Cell Proliferation or Cell Differentiation 
15 Activity 

Secreted proteins may act as cytokines or may affect cellular proliferation or differentiation. 
Many protein factors discovered to date, including all known cytokines, have exhibited activity in one or 
more factor dependent cell proliferation assays, and hence the assays serve as a convenient confirmation 
of cytokine activity. The activity of a protein of the present invention is evidenced by any one of a 

20 number of routine factor dependent cell proliferation assays for cell lines including, without limitation, 
32D, DA2, DA1G, T10, B9, B9/11, BaF3, MC9/G, M+ (preB M+), 2E8, RB5, DAI, 123, T1165, HT2, 
CTLL2, TF-1, Mo7c and CMK. The proteins encoded by the cDNAs of the invention or fragments 
thereof may be evaluated for their ability to regulate T cell or thymocyte proliferation in assays such as 
those described above or in the following references, which are incorporated herein by reference: Current 

25 Protocols in Immunology , Ed. by J.E. Coligan et al, Greene Publishing Associates and Wiley- 
Interscience; Takai et al J. Immunol 137:3494-3500, 1986. Bertagnolli et al J. Immunol 145:1706- 
1712, 1990. Bertagnolli et al, Cellular Immunology 133:327-341, 1991. Bertagnolli, et al J. Immunol 
149:3778-3783, 1992; Bowman et al 9 J. Immunol 152:1756-1761, 1994. 

In addition, numerous assays for cytokine production and/or the proliferation of spleen cells, 

30 lymph node cells and thymocytes are known. These include the techniques disclosed in Current Protocols 
in Immunology . J.E. Coligan et al Eds., Vol 1 pp. 3.12.1-3.12.14 John Wiley and Sons, Toronto. 1994; 
and Schreiber, R.D. Current Protocols in Immunology ., supra Vol 1 pp. 6.8.1-6.8.8, John Wiley and 
Sons, Toronto. 1994. 
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The proteins encoded by the cDNAs of the invention may also be assayed for the ability to 
regulate the proliferation and differentiation of hematopoietic or lymphopoietic cells. Many assays for 
such activity are familiar to those skilled in the art, including the assays in the following references, 
which are incorporated herein by reference: Bottomly, K., Davis, L.S. and Lipsky, P.E., Measurement of 
5 Human and Murine Interleukin 2 and Interleukin 4, Current Protocols in Immunology ., J.E. Coligan et al 
Eds. Vol 1 pp. 6.3.1-6.3.12, John Wiley and Sons, Toronto. 1991; deVries et al, J. Exp. Med. 173:1205- 
1211, 1991; Moreau et al, Nature 36:690-692, 1988; Greenberger et al, Proc. Natl Acad. ScL U.S.A. 
80:2931-2938, 1983; Nordan, R., Measurement of Mouse and Human Interleukin 6 Current Protocols in 
Immunology. J.E. Coligan et al. Eds. Vol 1 pp. 6.6.1-6,6.5, John Wiley and Sons, Toronto. 1991; Smith et 

10 al, Proc. Natl Acad. Sci. U.S.A. 83:1857-1861, 1986; Bennett, F., Giannotti, J., Clark, S.C. and Turner, 
K.J., Measurement of Human Interleukin 1 1 Current Protocols in Immunology . J.E. Coligan et al Eds. 
Vol 1 pp. 6.15.1 John Wiley and Sons, Toronto. 1991; Ciarletta, A., Giannotti, J., Clark, S.C. and Turner, 
K.J., Measurement of Mouse and Human Interleukin 9 Current Protocols in Immunology . J.E. Coligan et 
al, Eds. Vol 1 pp. 6.13.1, John Wiley and Sons, Toronto. 1991. 

15 The proteins encoded by the cDNAs of the invention may also be assayed for their ability to 

regulate T-cell responses to antigens. Many assays for such activity are familiar to those skilled in the art, 
including the assays described in the following references, which are incorporated herein by reference: 
Chapter 3 (In vitro Assays for Mouse Lymphocyte Function), Chapter 6 (Cytokines and Their Cellular 
Receptors) and Chapter 7, (Immunologic Studies in Humans) in Current Protocols in Immunology , J.E. 

20 Coligan et al Eds. Greene Publishing Associates and Wiley-Interscience; Weinberger et al, Proc. Natl 
Acad.Sci. USA 77:6091-6095, 1980; Weinberger et al, Eur. J. Immun. 11:405-411, 1981; Takai etal,J. 
Immunol 137:3494-3500, 1986; Takai et al.,J. Immunol 140:508-512, 1988. 

Those proteins which exhibit cytokine, cell proliferation, or cell differentiation activity may then 
be formulated as pharmaceuticals and used to treat clinical conditions in which induction of cell 

25 proliferation or differentiation is beneficial. Alternatively, as described in more detail below, genes 
encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced 
into appropriate host cells to increase or decrease the expression of the proteins as desired. 

Assaying GENSET proteins or Fragments Thereof for Activity as Immune System Regulators 

The proteins encoded by the cDNAs of the invention may also be evaluated for their effects as 
30 immune regulators. For example, the proteins may be evaluated for their activity to influence thymocyte 
or splenocyte cytotoxicity. Numerous assays for such activity are familiar to those skilled in the art 
including the assays described in the following references, which are incorporated herein by reference: 
Chapter 3 (In vitro Assays for Mouse Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic 
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studies in Humans) in Current Protocols in Immunology , J.E. Coligan et al Eds, Greene Publishing 
Associates and Wiley-Interscience; Herrmann et al, Proc. Natl. Acad. Sci. USA 78:2488-2492, 1981; 
Herrmann et al, J Immunol. 128:1968-1974, 1982; Handa et al, J. Immunol 135:1564-1572, 1985; 
Takai et al, J. Immunol. 137:3494-3500, 1986; Takai et al, J. Immunol. 140:508-512, 1988; Herrmann et 
5 al, Proc. Natl. Acad. Scl USA 78:2488-2492, 1981; Herrmann et al, J. Immunol. 128:1968-1974, 1982; 
Handa et al, J. Immunol 135:1564-1572, 1985; Takai et al, J. Immunol 137:3494-3500, 1986; Bowman 
et al, J Virology 61:1992-1998; Takai et al, J. Immunol 140:508-512, 1988; Bertagnolli et al, Cellular 
Immunology 133:327-341, 1991; Brown et al, J. Immunol. 153:3079-3092, 1994. 

The proteins encoded by the cDNAs of the invention may also be evaluated for their effects on T- 

10 cell dependent immunoglobulin responses and isotype switching. Numerous assays for such activity are 
familiar to those skilled in the art, including the assays disclosed in the following references, which are 
incorporated herein by reference: Maliszewski, J. Immunol. 144:3028-3033, 1990; Mond, J.J. and 
Brunswick, M Assays for B Cell Function: In vitro Antibody Production, Vol 1 pp. 3.8.1-3.8.16 in 
Current Protocols in Immunology. J.E. Coligan et al Eds., John Wiley and Sons, Toronto. 1994. 

15 The proteins encoded by the cDNAs of the invention may also be evaluated for their effect on 

immune effector cells, including their effect on Thl cells and cytotoxic lymphocytes. Numerous assays 
for such activity are familiar to those skilled in the art, including the assays disclosed in the following 
references, which are incorporated herein by reference: Chapter 3 {In vitro Assays for Mouse 
Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic Studies in Humans) in Current Protocols in 

20 Immunology , J.E. Coligan et al Eds., Greene Publishing Associates and Wiley-Interscience; Takai et al, 
J. Immunol 137:3494-3500, 1986; Takai et al; J. Immunol. 140:508-512, 1988; Bertagnolli et al, J. 
Immunol 149:3778-3783, 1992. 

The proteins encoded by the cDNAs of the invention may also be evaluated for their effect on 
dendritic cell mediated activation of naive T-cells. Numerous assays for such activity are familiar to those 

25 skilled in the art, including the assays disclosed in the following references, which are incorporated herein 
by reference: Guery et al, J. Immunol 134:536-544, 1995; Inaba et al, Journal of Experimental 
Medicine 173:549-559, 1991; Macatonia et al, Journal of Immunology 154:5071-5079, 1995; Porgador et 
al, Journal of Experimental Medicine 182:255-260, 1995; Nair et al, Journal of Virology 67:4062-4069, 
1993; Huang et al, Science 264:961-965, 1994; Macatonia et al, Journal of Experimental Medicine 

30 169:1255-1264, 1989; Bhardwaj et al, Journal of Clinical Investigation 94:797-807, 1994; and Inaba et 
al, Journal of Experimental Medicine 172:631-640, 1990. 

The proteins encoded by the cDNAs of the invention may also be evaluated for their influence on 
the lifetime of lymphocytes. Numerous assays for such activity are familiar to those skilled in the art, 
including the assays disclosed in the following references, which are incorporated herein by reference: 
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Darzynkiewicz et al., Cytometry 13:795-808, 1992; Gorczyca et aL, Leukemia 7:659-670, 1993; Gorczyca 
et al, Cancer Research 53:1945-1951, 1993; Itoh et al, Cell 66:233-243, 1991; Zacharchuk, Journal of 
Immunology 145:4037-4045, 1990; Zamai et al, Cytometry 14:891-897, 1993; Gorczyca et al, 
International Journal of Oncology 1 :63 9-648, 1992. 
5 Assays for proteins that influence early steps of T-cell commitment and development include, 

without limitation, those described in: Antica et al, Blood 84:111-117, 1994; Fine et al, Cellular 
immunology 155:1 1 1-122, 1994; Galy et al, Blood 85:2770-2778, 1995; Toki et al, Proc. Nat. Acad Sci. 
USA 88:7548-7551, 1991. 

Those proteins which exhibit activity as immune system regulators activity may then be 

10 formulated as pharmaceuticals and used to treat clinical conditions in which regulation of immune activity 
is beneficial. For example, the protein may be useful in the treatment of various immune deficiencies and 
disorders (including severe combined immunodeficiency (SCE))), e.g., in regulating (up or down) growth 
and proliferation of T and/or B lymphocytes, as well as effecting the cytolytic activity of NK cells and 
other cell populations. These immune deficiencies may be genetic or be caused by viral (e.g., HIV) as 

15 well as bacterial or fungal infections, or may result from autoimmune disorders. More specifically, 
infectious diseases caused by viral, bacteria], fungal or other infection may be treatable using a protein of 
the present invention, including infections by HIV, hepatitis viruses, herpesviruses, mycobacteria, 
Leishmania spp., malaria spp. and various fungal infections such as candidiasis. Of course, in this regard, 
a protein of the present invention may also be useful where a boost to the immune system generally may 

20 be desirable, i.e., in the treatment of cancer. 

Autoimmune disorders which may be treated using a protein of the present invention include, for 
example, connective tissue disease, multiple sclerosis, systemic lupus erythematosus, rheumatoid arthritis, 
autoimmune pulmonary inflammation, Guillain-Barre syndrome, autoimmune thyroiditis, insulin 
dependent diabetes mellitis, myasthenia gravis, graft-versus-host disease and autoimmune inflammatory 

25 eye disease. Such a protein of the present invention may also to be useful in the treatment of allergic 
reactions and conditions, such as asthma (particularly allergic asthma) or other respiratory problems. 
Other conditions, in which immune suppression is desired (including, for example, organ transplantation), 
may also be treatable using a protein of the present invention. 

Using the proteins of the invention it may also be possible to regulate immune responses, in a 

30 number of ways. Down regulation may be in the form of inhibiting or blocking an immune response 
already in progress or may involve preventing the induction of an immune response. The functions of 
activated T-cells may be inhibited by suppressing T cell responses or by inducing specific tolerance in T 
cells, or both. Immunosuppression of T cell responses is generally an active, non-antigen-specific, 
process which requires continuous exposure of the T cells to the suppressive agent. Tolerance, which 
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involves inducing non-responsiveness or anergy in T cells, is distinguishable from immunosuppression in 
that it is generally antigen-specific and persists after exposure to the tolerizing agent has ceased. 
Operationally, tolerance can be demonstrated by the lack of a T cell response upon reexposure to specific 
antigen in the absence of the tolerizing agent. 
5 Down regulating or preventing one or more antigen functions (including without limitation B 

lymphocyte antigen functions (such as, for example, B7)), e.g., preventing high level lymphokine 
synthesis by activated T cells, will be useful in situations of tissue, skin and organ transplantation and in 
graft-versus-host disease (GVHD). For example, blockage of T cell function should result in reduced 
tissue destruction in tissue transplantation. Typically, in tissue transplants, rejection of the transplant is 

10 initiated through its recognition as foreign by T cells, followed by an immune reaction that destroys the 
transplant. The administration of a molecule which inhibits or blocks interaction of a B7 lymphocyte 
antigen with its natural ligand(s) on immune cells (such as a soluble, monomeric form of a peptide having 
B7-2 activity alone or in conjunction with a monomeric form of a peptide having an activity of another B 
lymphocyte antigen (e.g., B7-1, B7-3) or blocking antibody), prior to transplantation can lead to the 

15 binding of the molecule to the natural ligand(s) on the immune cells without transmitting the 
corresponding costimulatory signal. Blocking B lymphocyte antigen function in this matter prevents 
cytokine synthesis by immune cells, such as T cells, and thus acts as an immunosuppressant. Moreover, 
the lack of costimulation may also be sufficient to anergize the T cells, thereby inducing tolerance in a 
subject. Induction of long-term tolerance by B lymphocyte antigen-blocking reagents may avoid the 

20 necessity of repeated administration of these blocking reagents. To achieve sufficient 
immunosuppression or tolerance in a subject, it may also be necessary to block the function of a 
combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing organ transplant rejection or GVHD 
can be assessed using animal models that are predictive of efficacy in humans. Examples of appropriate 

25 systems which can be used include allogeneic cardiac grafts in rats and xenogeneic pancreatic islet cell 
grafts in mice, both of which have been used to examine the immunosuppressive effects of CTLA4Ig 
fusion proteins in vivo as described in Lenschow et al., Science 257:789-792 (1992) and Turka et aL, 
Proc. Natl. Acad. Sci USA, 89: 1 1 102-1 1 105 (1992). In addition, murine models of GVHD (see Paul ed., 
Fundamental Immunology, Raven Press, New York, 1989, pp. 846-847) can be used to determine the 

30 effect of blocking B lymphocyte antigen function in vivo on the development of that disease. 

Blocking antigen function may also be therapeutically useful for treating autoimmune diseases. 
Many autoimmune disorders are the result of inappropriate activation of T cells that are reactive against 
self tissue and which promote the production of cytokines and autoantibodies involved in the pathology of 
the diseases. Preventing the activation of autoreactive T cells may reduce or eliminate disease symptoms. 
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Administration of reagents which block costimulation of T cells by disrupting receptor ligand interactions 
of B lymphocyte antigens can be used to inhibit T cell activation and prevent production of autoantibodies 
or T cell-derived cytokines which may be involved in the disease process. Additionally, blocking 
reagents may induce antigen-specific tolerance of autoreactive T cells which could lead to long-term 
5 relief from the disease. The efficacy of blocking reagents in preventing or alleviating autoimmune 
disorders can be determined using a number of well-characterized animal models of human autoimmune 
diseases. Examples include murine experimental autoimmune encephalitis, systemic lupus erythmatosis 
in MRL/pr/pr mice or NZB hybrid mice, murine autoimmuno collagen arthritis, diabetes mellitus in OD 
mice and BB rats, and murine experimental myasthenia gravis (see Paul ed., Fundamental Immunology, 

10 Raven Press, New York, 1989, pp. 840-856). 

Upregulation of an antigen function (preferably a B lymphocyte antigen function), as a means of 
up regulating immune responses, may also be useful in therapy. Upregulation of immune responses may 
be in the form of enhancing an existing immune response or eliciting an initial immune response. For 
example, enhancing an immune response through stimulating B lymphocyte antigen function may be 

15 useful in cases of viral infection. In addition, systemic viral diseases such as influenza, the common cold, 
and encephalitis might be alleviated by the administration of stimulatory form of B lymphocyte antigens 
systemically. 

Alternatively, anti-viral immune responses may be enhanced in an infected patient by removing T 
cells from the patient, costimulating the T cells in vitro with viral antigen-pulsed APCs either expressing 

20 a peptide of the present invention or together with a stimulatory form of a soluble peptide of the present 
invention and reintroducing the in vitro activated T cells into the patient. The infected cells would now 
be capable of delivering a costimulatory signal to T cells in vivo, thereby activating the T cells. 

In another application, up regulation or enhancement of antigen function (preferably B 
lymphocyte antigen function) may be useful in the induction of tumor immunity. Tumor cells (e.g., 

25 sarcoma, melanoma, lymphoma, leukemia, neuroblastoma, carcinoma) transfected with a nucleic acid 
encoding at least one peptide of the present invention can be administered to a subject to overcome 
tumor-specific tolerance in the subject. If desired, the tumor cell can be transfected to express a 
combination of peptides. For example, tumor cells obtained from a patient can be transfected ex vivo with 
an expression vector directing the expression of a peptide having B7-2-like activity alone, or in 

30 conjunction with a peptide having B7-l-like activity and/or B7-3-like activity. The transfected tumor 
cells are returned to the patient to result in expression of the peptides on the surface of the transfected cell. 
Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in vivo. 

The presence of the peptide of the present invention having the activity of a B lymphocyte 
antigen(s) on the surface of the tumor cell provides the necessary costimulation signal to T cells to induce 
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a T cell mediated immune response against the transfected tumor cells. In addition, tumor cells which 
lack MHC class I or MHC class II molecules, or which fail to reexpress sufficient amounts of MHC 
class I or MHC class II molecules, can be transfected with nucleic acids encoding all or a fragment of 
(e.g., a cytoplasmic-domain truncated fragment) of an MHC class I a chain protein and (3 2 microglobulin 
5 protein or an MHC class II a chain protein and an MHC class II P chain protein to thereby express MHC 
class I or MHC class II proteins on the cell surface. Expression of the appropriate class II or class II 
MHC in conjunction with a peptide having the activity of a B lymphocyte antigen (e.g., B7-1, B7-2, B7- 
3) induces a T cell mediated immune response against the transfected tumor cell. Optionally, a gene 
encoding an antisense construct which blocks expression of an MHC class II associated protein, such as 

10 the invariant chain,can also be cotransfected with a DNA encoding a peptide having the activity of a B 
lymphocyte antigen to promote presentation of tumor associated antigens and induce tumor specific 
immunity. Thus, the induction of a T cell mediated immune response in a human subject may be 
sufficient to overcome tumor-specific tolerance in the subject. Alternatively, as described in more detail 
below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be 

15 introduced into appropriate host cells to increase or decrease the expression of the proteins as desired. 

Assaying GENSET proteins or Fragments Thereof for Hematopoiesis Regulating Activity 

The proteins encoded by the cDNAs of the invention or fragments thereof may also be evaluated 
for their hematopoiesis regulating activity. For example, the effect of the proteins on embryonic stem cell 
differentiation may be evaluated. Numerous assays for such activity are familiar to those skilled in the 

20 art, including the assays disclosed in the following references, which are incorporated herein by reference: 
Johansson et al Cellular Biology 15:141-151, 1995; Keller et al., Molecular and Cellular Biology 
13:473-486, 1993; McClanahan et al 9 Blood 81:2903-2915, 1993. 

The proteins encoded by the cDNAs of the invention or fragments thereof may also be evaluated 
for their influence on the lifetime of stem cells and stem cell differentiation. Numerous assays for such 

25 activity are familiar to those skilled in the art, including the assays disclosed in the following references, 
which are incorporated herein by reference: Freshney, M.G. Methylcellulose Colony Forming Assays, in 
Culture of Hematopoietic Cells . R.I. Freshney, et al Eds. pp. 265-268, Wiley-Liss, Inc., New York, NY. 
1994; Hirayama et al, Proc. Natl. Acad. Sci. USA 89:5907-5911, 1992; McNiece, I.K. and Briddell, R.A. 
Primitive Hematopoietic Colony Forming Cells with High Proliferative Potential, in Culture of 

30 Hematopoietic Cells . R.L Freshney, et al eds. Vol pp. 23-39, Wiley-Liss, Inc., New York, NY. 1994; 
Neben et al, Experimental Hematology 22:353-359, 1994; Ploemacher, R.E. Cobblestone Area Forming 
Cell Assay, In Culture of Hematopoietic Cells. R.I. Freshney, et al Eds. pp. 1-21, Wiley-Liss, Inc., New 
York, NY. 1994; Spooncer, E., Dexter, M. and Allen, T. Long Term Bone Marrow Cultures in the 
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Presence of Stromal Cells, in Culture of Hematopoietic Cells . R.I. Freshney, et al. Eds. pp. 163-179, 
Wiley-Liss, Inc., New York, NY. 1994; and Sutherland, H.J. Long Term Culture Initiating Cell Assay, in 
Culture of Hematopoietic Cells . R.I. Freshney, et al Eds. pp. 139-162, Wiley-Liss, Inc., New York, NY. 
1994. 

5 Those proteins which exhibit hematopoiesis regulatory activity may then be formulated as 

pharmaceuticals and used to treat clinical conditions in which regulation of hematopoeisis is beneficial. 
For example, a protein of the present invention may be useful in regulation of hematopoiesis and, 
consequently, in the treatment of myeloid or lymphoid cell deficiencies. Even marginal biological 
activity in support of colony forming cells or of factor-dependent cell lines indicates involvement in 

10 regulating hematopoiesis, e.g. in supporting the growth and proliferation of erythroid progenitor cells 
alone or in combination with other cytokines, thereby indicating utility, for example, in treating various 
anemias or for use in conjunction with irradiation/chemotherapy to stimulate the production of erythroid 
precursors and/or erythroid cells; in supporting the growth and proliferation of myeloid cells such as 
granulocytes and monocytes/macrophages (i.e., traditional CSF activity) useful, for example, in 

15 conjunction with chemotherapy to prevent or treat consequent myelo-suppression; in supporting the 
growth and proliferation of megakaryocytes and consequently of platelets thereby allowing prevention or 
treatment of various platelet disorders such as thrombocytopenia, and generally for use in place of or 
complimentary to platelet transfusions; and/or in supporting the growth and proliferation of hematopoietic 
stem cells which are capable of maturing to any and all of the above-mentioned hematopoietic cells and 

20 therefore find therapeutic utility in various stem cell disorders (such as those usually treated with 
transplantion, including, without limitation, aplastic anemia and paroxysmal nocturnal hemoglobinuria), 
as well as in repopulating the stem cell compartment post irradiation/chemotherapy, either in-vivo or ex- 
vivo (i.e., in conjunction with bone marrow transplantation or with peripheral progenitor cell 
transplantation (homologous or heterologous)) as normal cells or genetically manipulated for gene 

25 therapy. Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids 
regulating the expression of these proteins may be introduced into appropriate host cells to increase or 
decrease the expression of the proteins as desired. 

Assaying GENSET proteins or Fragments Thereof for Regulation of Tissue Growth 

The proteins encoded by the cDNAs of the invention or fragments thereof may also be evaluated 
30 for their effect on tissue growth. Numerous assays for such activity are familiar to those skilled in the art, 
including the assays disclosed in International Patent Publication No. WO95/16035, International Patent 
Publication No. WO95/05846 and International Patent Publication No. WO91/07491, which are 
incorporated herein by reference. 
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Assays for wound healing activity include, without limitation, those described in: Winter, 
Epidermal Wound Healing , pps. 71-112 (Maibach, HI and Rovee, DT, eds.), Year Book Medical 
Publishers, Inc., Chicago, as modified by Eaglstein and Mertz, J. Invest. Dermatol 71:382-84 (1978) 
which are incorporated herein by reference. 
5 Those proteins which are involved in the regulation of tissue growth may then be formulated as 

pharmaceuticals and used to treat clinical conditions in which regulation of tissue growth is beneficial. 
For example, a protein of the present invention also may have utility in compositions used for bone, 
cartilage, tendon, ligament and/or nerve tissue growth or regeneration, as well as for wound healing and 
tissue repair and replacement, and in the treatment of burns, incisions and ulcers. 

10 A protein of the present invention, which induces cartilage and/or bone growth in circumstances 

where bone is not normally formed, has application in the healing of bone fractures and cartilage damage 
or defects in humans and other animals. Such a preparation employing a protein of the invention may 
have prophylactic use in closed as well as open fracture reduction and also in the improved fixation of 
artificial joints. De novo bone formation induced by an osteogenic agent contributes to the repair of 

15 congenital, trauma induced, or oncologic resection induced craniofacial defects, and also is useful in 
cosmetic plastic surgery. 

A protein of this invention may also be used in the treatment of periodontal disease, and in other 
tooth repair processes. Such agents may provide an environment to attract bone-forming cells, stimulate 
growth of bone-forming cells or induce differentiation of progenitors of bone-forming cells. A protein of 

20 the invention may also be useful in the treatment of osteoporosis or osteoarthritis, such as through 
stimulation of bone and/or cartilage repair or by blocking inflammation or processes of tissue destruction 
(collagenase activity, osteoclast activity, etc.) mediated by inflammatory processes. 

Another category of tissue regeneration activity that may be attributable to the protein of the 
present invention is tendon/ligament formation. A protein of the present invention, which induces 

25 tendon/ligament-like tissue or other tissue formation in circumstances where such tissue is not normally 
formed, has application in the healing of tendon or ligament tears, deformities and other tendon or 
ligament defects in humans and other animals. Such a preparation employing a tendon/ligament-like 
tissue inducing protein may have prophylactic use in preventing damage to tendon or ligament tissue, as 
well as use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing 

30 defects to tendon or ligament tissue. De novo tendon/ligament-like tissue formation induced by a 
composition of the present invention contributes to the repair of congenital, trauma induced, or other 
tendon or ligament defects of other origin, and is also useful in cosmetic plastic surgery for attachment or 
repair of tendons or ligaments. The compositions of the present invention may provide an environment to 
attract tendon- or ligament-forming cells, stimulate growth of tendon- or ligament-forming cells, induce 
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differentiation of progenitors of tendon- or ligament-forming cells, or induce growth of tendon/ligament 
cells or progenitors ex vivo for return in vivo to effect tissue repair. The compositions of the invention 
may also be useful in the treatment of tendinitis, carpal tunnel syndrome and other tendon or ligament 
defects. The compositions may also include an appropriate matrix and/or sequestering agent as a carrier 
5 as is well known in the art. 

The protein of the present invention may also be useful for proliferation of neural cells and for 
regeneration of nerve and brain tissue, i.e., for the treatment of central and peripheral nervous system 
diseases and neuropathies, as well as mechanical and traumatic disorders, which involve degeneration, 
death or trauma to neural cells or nerve tissue. More specifically, a protein may be used in the treatment 

10 of diseases of the peripheral nervous system, such as peripheral nerve injuries, peripheral neuropathy and 
localized neuropathies, and central nervous system diseases, such as Alzheimer's, Parkinson's disease, 
Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager syndrome. Further conditions which 
may be treated in accordance with the present invention include mechanical and traumatic disorders, such 
as spinal cord disorders, head trauma and cerebrovascular diseases such as stroke. Peripheral 

15 neuropathies resulting from chemotherapy or other medical therapies may also be treatable using a protein 
of the invention. 

Proteins of the invention may also be useful to promote better or faster closure of non-healing 
wounds, including without limitation pressure ulcers, ulcers associated with vascular insufficiency, 
surgical and traumatic wounds, and the like. 

20 It is expected that a protein of the present invention may also exhibit activity for generation or 

regeneration of other tissues, such as OTgans (including, for example, pancreas, liver, intestine, kidney, 
skin, endothelium) muscle (smooth, skeletal or cardiac) and vascular (including vascular endothelium) 
tissue, or for promoting the growth of cells comprising such tissues. Part of the desired effects may be by 
inhibition or modulation of fibrotic scarring to allow normal tissue to generate. A protein of the invention 

25 may also exhibit angiogenic activity. 

A protein of the present invention may also be useful for gut protection or regeneration and 
treatment of lung or liver fibrosis, reperfusion injury in various tissues, and conditions resulting from 
systemic cytokine damage. 

A protein of the present invention may also be useful for promoting or inhibiting differentiation 

30 of tissues described above from precursor tissues or cells; or for inhibiting the growth of tissues described 
above. 

Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids 
regulating the expression of these proteins may be introduced into appropriate host cells to increase or 
decrease the expression of the proteins as desired. 
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Assaying GENSET proteins or Fragments Thereof for Regulation of Reproductive Hormones or Cell 
Movement 

The proteins encoded by the cDNAs of the invention or fragments thereof may also be evaluated 
for their ability to regulate reproductive hormones, such as follicle stimulating hormone. Numerous 
5 assays for such activity are familiar to those skilled in the art, including the assays disclosed in the 
following references, which are incorporated herein by reference: Vale et al, Endocrinology 91:562-572, 
1972; Ling et al, Nature 321 :779-782, 1986; Vale et al, Nature 321:776-779, 1986; Mason et aL, Nature 
318:659-663, 1985; Forage et al 9 Proc. Natl Acad. Sci. USA 83:3091-3095, 1986. Chapter 6.12 
(Measurement of Alpha and Beta Chemokines) Current Protocols in Immunology , J.E. Coligan et al Eds. 

10 Greene Publishing Associates and Wiley-Intersciece ; Taub et al. J. Clin. Invest. 95:1370-1376, 1995; 
Lind et al. APMIS 103:140-146, 1995; Muller et al Eur. J. Immunol 25:1744-1748; Gruber et al J. of 
Immunol 152:5860-5867, 1994; Johnston et al J. of Immunol 153:1762-1768, 1994. 

Those proteins which exhibit activity as reproductive hormones or regulators of cell movement 
may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of 

15 reproductive hormones or cell movement are beneficial. For example, a protein of the present invention 
may also exhibit activin- or inhibin-related activities. Inhibins are characterized by their ability to inhibit 
the release of follicle stimulating hormone (FSH), while activins are characterized by their ability to 
stimulate the release of folic stimulating hormone (FSH). Thus, a protein of the present invention, alone 
or in heterodimers with a member of the inhibin a family, may be useful as a contraceptive based on the 

20 ability of inhibins to decrease fertility in female mammals and decrease spermatogenesis in male 
mammals. Administration of sufficient amounts of other inhibins can induce infertility in these 
mammals. Alternatively, the protein of the invention, as a homodimer or as a heterodimer with other 
protein subunits of the inhibin-B group, may be useful as a fertility inducing therapeutic, based upon the 
ability of activin molecules in stimulating FSH release from cells of the anterior pituitary. See, for 

25 example, United States Patent 4,798,885, the disclosure of which is incorporated herein by reference. A 
protein of the invention may also be useful for advancement of the onset of fertility in sexually immature 
mammals, so as to increase the lifetime reproductive performance of domestic animals such as cows, 
sheep and pigs. 

Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids 
30 regulating the expression of these proteins may be introduced into appropriate host cells to increase or 
decrease the expression of the proteins as desired. 
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Assaying GENSET proteins or Fragments Thereof for Chemotactic/Chemokinetic Activity 

The proteins encoded by the cDNAs of the invention or fragments thereof may also be evaluated 
for chemotactic/chemokinetic activity. For example, a protein of the present invention may have 
chemotactic or chemokinetic activity (e.g., act as a chemokine) for mammalian cells, including, for 
5 example, monocytes, fibroblasts, neutrophils, T-cells, mast cells, eosinophils, epithelial and/or endothelial 
cells. Chemotactic and chmokinetic proteins can be used to mobilize or attract a desired cell population 
to a desired site of action. Chemotactic or chemokinetic proteins provide particular advantages in 
treatment of wounds and other trauma to tissues, as well as in treatment of localized infections. For 
example, attraction of lymphocytes, monocytes or neutrophils to tumors or sites of infection may result in 
10 improved immune responses against the tumor or infecting agent. 

A protein or peptide has chemotactic activity for a particular cell population if it can stimulate, 
directly or indirectly, the directed orientation or movement of such cell population. Preferably, the 
protein or peptide has the ability to directly stimulate directed movement of cells. Whether a particular 
protein has chemotactic activity for a population of cells can be readily determined by employing such 
15 protein or peptide in any known assay for cell chemotaxis. 

The activity of a protein of the invention may, among other means, be measured by the following 
methods: 

Assays for chemotactic activity (which will identify proteins that induce or prevent chemotaxis) 
consist of assays that measure the ability of a protein to induce the migration of cells across a membrane 

20 as well as the ability of a protein to induce the adhension of one cell population to another cell population. 
Suitable assays for movement and adhesion include, without limitation, those described in: Current 
Protocols in Immunology, Ed by I.E. Coligan, A.M. Kruisbeek, D.H. Margulies, E.M. Shevach, W. 
Strober, Pub. Greene Publishing Associates and Wiley-Interscience (Chapter 6.12, Measurement of alpha 
and beta Chemokincs 6.12.1-6.12.28; Taub et al J. Clin. Invest. 95:1370-1376, 1995; Lind et al APMIS 

25 103:140-146, 1995; Mueller et al Eur. J. Immunol. 25:1744-1748; Gruber et al J. of Immunol. 152:5860- 
5867, 1994; Johnston et al J. of Immunol, 153:1762-1768, 1994. 

Assaying GENSET proteins or Fragments Thereof for Regulation of Blood Clotting 

The proteins encoded by the cDNAs of the invention or fragments thereof may also be evaluated 
for their effects on blood clotting. Numerous assays for such activity are familiar to those skilled in the 
30 art, including the assays disclosed in the following references, which are incorporated herein by reference: 
Linet et al, J. Clin. Pharmacol 26:131-140, 1986; Burdick et al, Thrombosis Res. 45:413-419, 1987; 
Humphrey et al, Fibrinolysis 5:71-79 (1991); Schaub, Prostaglandins 35:467-474, 1988. 
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Those proteins which are involved in the regulation of blood clotting may then be formulated as 
pharmaceuticals and used to treat clinical conditions in which regulation of blood clotting is beneficial 
For example, a protein of the invention may also exhibit hemostatic or thrombolytic activity. As a result, 
such a protein is expected to be useful in treatment of various coagulations disorders (including hereditary 
5 disorders, such as hemophilias) or to enhance coagulation and other hemostatic events in treating wounds 
resulting from trauma, surgery or other causes. A protein of the invention may also be useful for 
dissolving or inhibiting formation of thromboses and for treatment and prevention of conditions resulting 
therefrom (such as, for example, infarction of cardiac and central nervous system vessels (e.g., stroke)). 
Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating 
10 the expression of these proteins may be introduced into appropriate host cells to increase or decrease the 
expression of the proteins as desired. 

Assaying GENSET proteins or Fragments Thereof for Involvement in Receptor/Ligand Interactions 

The proteins encoded by the cDNAs or a fragment thereof may also be evaluated for their 
involvement in receptor/ligand interactions. Numerous assays for such involvement are familiar to those 

15 skilled in the art, including the assays disclosed in the following references, which are incorporated herein 
by reference: Chapter 7.28 (Measurement of Cellular Adhesion under Static Conditions 7.28.1-7.28.22) in 
Current Protocols in Immunology , J.E. Coligan et al Eds. Greene Publishing Associates and Wiley- 
Interscience; Takai et al, Proa Nail. Acad. Set USA 84:6864-6868, 1987; Bierer et al, J. Exp. Med. 
168:1 145-1 156, 1988; Rosenstein et al, J. Exp. Med. 169:149-160, 1989; Stoltenborg et al, J. Immunol 

20 Methods 175:59-68, 1994; Stitt et al, Cell 80:661-670, 1995; Gyuris et al, Cell 75:791-803, 1993. 

For example, the proteins of the present invention may also demonstrate activity as receptors, 
receptor ligands or inhibitors or agonists of receptor/ligand interactions. Examples of such receptors and 
ligands include, without limitation, cytokine receptors and their ligands, receptor kinases and their 
ligands, receptor phosphatases and their ligands, receptors involved in cell-cell interactions and their 

25 ligands (including without limitation, cellular adhesion molecules (such as selectins, integrins and their 
ligands) and receptor/ligand pairs involved in antigen presentation, antigen recognition and development 
of cellular and humoral immune respones). Receptors and ligands are also useful for screening of 
potential peptide or small molecule inhibitors of the relevant receptor/ligand interaction. A protein of the 
present invention (including, without limitation, fragments of receptors and ligands) may themselves be 

30 useful as inhibitors of receptor/ligand interactions. 



481 



G-078US05DIV 



Assaying GENSET proteins or Fragments Thereof for Anti -Inflammatory Activity 

The proteins encoded by the cDNAs or a fragment thereof may also be evaluated for anti- 
inflammatory activity. The anti-inflammatory activity may be achieved by providing a stimulus to cells 
involved in the inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for 
5 example, cell adhesion), by inhibiting or promoting chemotaxis of cells involved in the inflammatory 
process, inhibiting or promoting cell extravasation, or by stimulating or suppressing production of other 
factors which more directly inhibit or promote an inflammatory response. Proteins exhibiting such 
activities can be used to treat inflammatory conditions including chronic or acute conditions), including 
without limitation inflammation associated with infection (such as septic shock, sepsis or systemic 
10 inflammatory response syndrome (SIRS)), ischemia-reperfusioninury, endotoxin lethality, arthritis, 
complement-mediated hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, 
inflammatory bowel disease, Crohn's disease or resulting from over production of cytokines such as TNF 
or IL-1. Proteins of the invention may also be useful to treat anaphylaxis and hypersensitivity to an 
antigenic substance or material. 

15 Assaying GENSET proteins or Fragments Thereof for Tumor Inhibition Activity 

The proteins encoded by the cDNAs of the invention or a fragment thereof may also be evaluated 
for tumor inhibition activity. In addition to the activities described above for immunological treatment or 
prevention of tumors, a protein of the invention may exhibit other anti-tumor activities. A protein may 
inhibit tumor growth directly or indirectly (such as, for example, via ADCC). A protein may exhibit its 

20 tumor inhibitory activity by acting on tumor tissue or tumor precursor tissue, by inhibiting formation of 
tissues necessary to support tumor growth (such as, for example, by inhibiting angiogenesis), by causing 
production of other factors, agents or cell types which inhibit tumor growth, or by suppressing, 
eliminating or inhibiting factors, agents or cell types which promote tumor growth. 

A protein of the invention may also exhibit one or more of the following additional activities or 

25 effects: inhibiting the growth, infection or function of, or killing, infectious agents, including, without 
limitation, bacteria, viruses, fungi and other parasites; effecting (suppressing or enhancing) bodily 
characteristics, including, without limitation, height, weight, hair color, eye color, skin, fat to lean ratio or 
other tissue pigmentation, or organ or body part size or shape (such as, for example, breast augmentation 
or diminution, change in bone form or shape); effecting biorhythms or circadian cycles or rhythms; 

30 effecting the fertility of male or female subjects; effecting the metabolism, catabolism, anabolism, 
processing, utilization, storage or elimination of dietary fat, lipid, protein, carbohydrate, vitamins, 
minerals, cofactors or other nutritional factors or component(s); effecting behavioral characteristics, 
including, without limitation, appetite, libido, stress, cognition (including cognitive disorders), depression 
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(including depressive disorders) and violent behaviors; providing analgesic effects or other pain reducing 
effects; promoting differentiation and growth of embryonic stem cells in lineages other than 
hematopoietic lineages; hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of 
the enzyme and treating deficiency-related diseases; treatment of hyperproliferative disorders (such as, for 
5 example, psoriasis); immunoglobulin-like activity (such as, for example, the ability to bind antigens or 
complement); and the ability to act as an antigen in a vaccine composition to raise an immune response 
against such protein or another material or entity which is cross-reactive with such protein. 
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Table I 



Seq Id No 


Internal designation 


Type 


Vector 


1 


11 9-003 -4-0-C2-CS 


DNA 


pBluescriptll SK- 


2 


105-016-1-0-D3-CS 


DNA 


pBluescriptll SK- 


3 


105-016-3-0-G10-CS 


DNA 


pBluescriptll SK- 


4 


105-026-1 -0-A5-CS 


DNA 


pBluescriptll SK- 


5 


105-03 1-1-0-All-CS 


DNA 


pBluescriptll SK- 


6 


105-03 1-2-0-D3-CS 


DNA 


pBluescriptll SK- 


7 


1 05-035 -2-0-C6-CS 


DNA 


pBluescriptll SK- 


8 


105-037-2-0-H11-CS 


DNA 


pBluescriptll SK- 


9 


105-053-4-0-E8-CS 


DNA 


pBluescriptll SK- 


10 


105-074-3-0-H10-CS 


DNA 


pBluescriptll SK- 


11 


105-089-3-0-G10-CS 


DNA 


pBluescriptll SK- 


12 


105-095-2-0-G11-CS 


DNA 


pBluescriptll SK- 


13 


106-006-1-0-E3-CS 


DNA 


pBluescriptll SK- 


14 


106-037-l-0-E9-CS.cor 


DNA 


pBluescriptll SK- 


15 


106-037-l-0-E9-CS.fr 


DNA 


pBluescnptll SK- 


16 


106-043-4-0-H3-CS 


DNA 


pBluescriptll SK- 


17 


11 0-007- 1-0-C7-CS 


DNA 


pBluescriptll SK- 


18 


114-016-1-0-H8-CS 


DNA 


pBluescriptll SK- 


19 


116-004-3-0-A6-CS 


DNA 


pBluescriptll SK- 


20 


116-054-3-0-E6-CS 


DNA 


pBluescriptn SK- 


21 


116-055-1-0-A3-CS 


DNA 


pBluescriptll SK- 


22 


116-055-2-0-F7-CS 


DNA 


pBluescriptll SK- 


23 


116-088-4-0-A9-CS 


DNA 


pBluescnptn SK- 


24 


116-091-1-0-D9-CS 


DNA 


pBluescriptll SK- 


25 


116-110-2-0-F4-CS 


DNA 


pBluescriptll SK- 


26 


116-11 1-1 -0-H9-CS 


DNA 


pBluescriptll SK- 


27 


116-1 11 -4-0-B3-CS 


DNA 


pBluescriptll SK- 


28 


116-115-2-0-F8-CS 


DNA 


pBluescriptll SK- • 


29 


116-119-3-0-H5-CS 


DNA 


pBluescriptll SK- 


30 


117-001-5-0-G3-CS 


DNA 


pBluescriptll SK- 


31 


145-25-3-0-B4-CS.cor 


DNA 


pBluescriptll SK- 


32 


145-25-3-0-B4-CS.fr 


DNA 


pBluescriptll SK- 


33 


145-56-3-0-D5-CS 


DNA 


pBluescriptll SK- 


34 


145-59-2-0-A7-CS 


DNA 


pBluescriptll SK- 


35 


157-15-4-0-B11-CS 


DNA 


pBluescriptll SK- 


36 


160-103-1-0-Fll-CS 


DNA 


pBluescriptll SK- 


37 


160-37-2-0-H7-CS 


DNA 


pBluescriptll SK- 


38 


160-58-3-0-H3-CS 


DNA 


pBluescriptll SK- 


39 


160-75-4-0-A9-CS 


DNA 


pBluescriptll SK- 


40 


174-10-2-0-F8-CS 


DNA 


pPT 


41 


174-33-3-0-F6-CS 


DNA 


pPT 


42 


1 74-38-1 -0-B6-CS 


DNA 


pPT 


43 


174-38-3-0-C9-CS 


DNA 


pPT 


44 


174-39-2-0-A3-CS 


DNA 


pPT 
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45 


174-41-1-0-A6-CS 


DNA 


dPT 


46 


174_5_3_0-H7-CS 


DNA 


T)PT 
pr i 


47 


174-7-4-0-H1-CS 


DNA 


dPT 


48 


175-1-3-0-E5-CS cor 


DNA 


dPT 


49 


175-1-3-0-E5-CS fr 


DNA 


nPT 


50 


180-19-4-0-F4-CS 


DNA 


nRluescriritTl kSK- 

L/XJl IXV/OVl 1 LJ LXX Ulv 


51 


181-10-1-0-DlO-CS 


DNA 


nRlue^crintTT SK- 

UUl UivvOV/l 1UU1 kJxV. j 


52 


181-1 6-1 -0-G7-PS 


DNA 


nRlne^rrintTT STC- 

|JJ_>lUV^OVvl I^ILXI Ul\ 


53 


181-1 6-2 -0-A7-CS 


DNA 


nRlueqcrintTT SK- 


54 


181-20-3-0-R5-PS 


DNA 


pD I UCStl 1 jj 111 Oxv 




181-3-3-0-R8-PS 

X O 1 .7 ~J \J iS O V^O 


DNA 


nRlue^rrintTT SK- 


56 


181-3-3-0-P9-PS 


DNA 


nRlue<;rrintTT 


57 

s i 


182-1-2-0-D12-PS 


DNA 




58 

^/ o 


184-1-4-0-P1 1-PS 

1 O^r — 1 \J v_y 1 1 V> O 


DNA 


\Jis 1 Lit oV/l LyJ 111 Olv - 


59 


1 84-4- 1-0- A 1 1-CS 


DNA 


nRlue^crintTT SK- 


60 


187-12-4-0-A8-CS 

10/ i^^t V/ rvo v_^o 


DNA 


nRluescrintTT SK- 


61 


187-2-2-0-A3-CS 

X O / Z, VJ 1V~J V^O 


DNA 


nRlue^crintTT SK- 

\J>S 1 tit oV/1 1 Li LI I kJXV 


62 


1 87-3 1 -0-0-fl 2-PS 

lO / J 1 V/ V/ I 1 Z» V^O 


DNA 


nRlne^rrintTT SK- 


63 


187-34-0-041 2-CS 


DNA 


TjRluescrirjtTT SK- 


64 


187-37-0-0-clO-PS 

XO/ J/ U U VylU \s O 


DNA 


nRlue^crintTT SK- 

\Ji-> 1 Lit oV/1 1 L) LI I LJxV 


65 


187-3 8-0-0-1 10-CS 


DNA 


r)R1ue serin tTT SK- 


66 


187-39-0-0-kl2-PS 


DNA 


nRlne^rrintTT SK- 


61 


1 87-4 l-0-0-i2 1-PS 

X O / X V/ V/ 1Z- 1 V„x O 


DNA 


nRhiescrintTT SK- 

UUlLlt-oVyl 1LIL11 01\ 


68 


188-1 1-1 -0-R3-PS 

lOl) 1 XXV/ U J V/J 


DNA 


nRluescriritTT SK- 

IJU ILltoV^l 11JLXX OXV 


69 


1 88-1 8-4-0-AQ-PS 


DNA 

x_y i n j^v 


piDlUCoV/l ipill OXV- s 


70 


1 88-28-4-0-R1 2-PS rnr 

1 OO Z»O^T V/ i_> 1 Z< V^O . V/VJ1 


DNA 


nRliiPQrrintTT SK- 

pD I Lit 3L1 1 U LI I OlY 


71 
/ 1 


188-78-4-0-R17-PS fr 

1 OO AO" V D 1 Z- V_yO . 11 


DNA 


nRlnpQrrintTT SK- 

Ul_> 1 Lit oL/l 1U 111 OxV 


/ A. 


1 88-28-4-0-D4-PS 


DNA 


nRliipQprintTT SK- 

UD 1 UCdLl 1 U 111 OXV 


11 


1 88-41 -1 -0-R8-PS rnr 


DNA 


nRliiPsrriritTT ^sK- 
pOLUCbLylipLll oiv- 


74 


1 88-41 -1 -0-R8-P^ fr 


DNA 




75 


1 88-45-1 -0-DQ-PS 


DNA 


piDlUCbLyi ipill orv - 


16 


1 88-Q-2-0-F1 -PS 


DNA 


r»R1npQrrintTT 

pOlUCbLylipLll l31v- 


11 


1 05-079- VO-A 1 1 -P<\ 


DNA 


p£>lUCoV>I ipill Olv- 


78 


1 05-092-1 -0-FT7-PS 


DNA 

1SL > r\ 


nRliiPQprintTT SK- 

pOIUCoLyl ipill OxV- 


79 

/ y 


1 05- 141 -4-0-FF9-PS 


DNA 

IS 1 > f\ 


nRhipqrrintTT SK- 

pjJlULoLI ipLll OXV 


80 


1 09-0 13-1 -0-R9-PS 

lu7 vlJ 1 v U7 V>0 


DNA 


nRlup^printTT SK- 

pj_> ILit ot-l lpvxl Oxv 


81 

O 1 


1 10-008-4-0-D9-PS 

1 IV vvO *T \J isy V^O 


DNA 

iS L N I\. 


nRliip^rrintTT SK- 

pi_>l UCoLi ipill OXV - 


82 


1 1 4-00 1 -3 -0- A2-CS 

1 1 " V/Vf X *J \J jfli. V_^0 


DNA 


nRluescrintTT SK- 

UUJULjvI ipixx O XV 


83 


1 1 4-028-2 -0-P1-CS 


DNA 

is x y n 


nRlupsprintTT SK- 

uuiuvovi ipiix O IV 


84 

o^ 


1 14-037-1 -0-H10-PS 

1 1 t ~T \J ~> X V 1 1 X u J 


DNA 

1/1>A 


nRhipQprintTT SK- 

pDlULoLI ipLll OIV 


85 


1 14-043-2-0- A 10-PS 


DNA 


nRliiPQpn'ntTT SK- 

pjJ 1 UCoLl 1 p 111 Oxv - 


86 


1 14.044-1 -0-P5-P9 

1 1*T-V"" 1 V V_x»7 V/U 


DNA 

XV 1 N /V 


nRlnp<;printTT SK- 
p£> I UCoLvl ip 111 orV- 


87 


116-003-3-0-DlO-CS 


DNA 


r)RluescrintTT SK- 


88 


116-003-3-0-G12-CS 


DNA 


pBluescriptll SK- 


89 


1 16-01 1-2-0-F11-CS 


DNA 


pBluescriptll SK- 


90 


116-033-3-0-E4-CS 


DNA 


pBluescriptll SK- 


91 


H6-041-4-0-B6-CS 


DNA 


pBluescriptll SK- 


92 


1 16-044-2 -0-C4-CS 


DNA 


pBluescriptll SK- 
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93 


116-075-1-0-E6-CS 


DNA 


pBluescriptll SK- 


94 


116-094-4-0-G5-CS 


DNA 


pBluescriptll SK- j 


95 


117-005-3-0-F2-CS 


DNA 


pBluescriptll SK- 


96 


121-007-3-0-D9-CS 


DNA 


pBluescriptll SK- 


97 


145-91-3-0-D10-CS 


DNA 


pBluescriptll SK- 


98 


157-17-1-0-F4-CS 


DNA 


pBluescriptll SK- 


99 


160-1 1-3-0-G8-CS 


DNA 


pBluescriptll SK- 


100 


1 60-24- 1-0-F12-CS 


DNA 


pBluescriptll SK- 


101 


160-24-2-0-E9-CS 


DNA 


pBluescriptll SK- 


102 


1 60-25 -4-0-D2-CS 


DNA 


pBluescriptll SK- 


103 


160-31-3-0-A11-CS 


DNA 


pBluescriptll SK- 


104 


160-32-1-0-F6-CS 


DNA 


pBluescriptll SK- j 


105 


160-37-1-0-A3-CS 


DNA 


pBluescriptll SK- 


106 


1 60-40-3 -0-E9-CS 


DNA 


pBluescriptll SK- 


107 


160-58-3-0-E4-CS 


DNA 


pBluescriptll SK- 


108 


160-85-3-0-D4-CS 


DNA 


pBluescriptll SK- 


109 


160-95-3-0-A11-CS 


DNA 


pBluescriptll SK- 


110 


162-10-4-0-F9-CS.cor 


DNA 


pBluescriptll SK- 


111 


162-10-4-0-F9-CS.fr 


DNA 


pBluescriptll SK- 


112 


174-13-2-0-E4-CS 


DNA 


pPT 


113 


174-46-2-0-B11-CS 


DNA 


pPT 


114 


179-8-2-0-A6-CS 


DNA 


pBluescriptll SK- 


115 


180-22-3-0-B6-CS 


DNA 


pBluescriptll SK- 


116 


181 -13-1 -0-F7-CS 


DNA 


pBluescriptll SK- 


117 


181-15-4-0-F7-CS 


DNA 


pBluescriptll SK- 


118 


181-20-1 -0-G7-CS 


DNA 


pBluescriptll SK- 


119 


184-15-3-0-D1-CS 


DNA 


pBluescriptll SK- 


120 


187-12-2-0-G11-CS 


DNA 


pBluescriptll SK- 


121 


187-2-2-0-A12-CS 


DNA 


pBluescriptll SK- 


122 


187-30-0-0-k23-CS 


DNA 


pBluescriptll SK- 


123 


187-36-0-0-el9-CS 


DNA 


pBluescriptll SK- 


124 


187-38-0-0-d22-CS 


DNA 


pBluescriptll SK- 


125 


187-39-0-0-b9-CS 


DNA 


pBluescriptll SK- 


126 


187-39-0-0-g6-CS 


DNA 


pBluescriptll SK- 


127 


1 87-45-0-0-1 18-CS 


DNA 


pBluescriptll SK- 


128 


187-45-0-0-m21-CS 


DNA 


pBluescriptll SK- 1 


129 


187-45-0-0-n8-CS 


DNA 


pBluescriptll SK- 


130 


187-46-0-0-f23-CS 


DNA 


pBluescriptll SK- 


131 


187-5-1-0-A12-CS 


DNA 


pBluescriptll SK- 


132 


1 87-5-1 -0-F6-CS 


DNA 


pBluescriptll SK- 


133 


187-5-2-0-B2-CS 


DNA 


pBluescriptll SK- 


134 


187-5-3-0-D5-CS 


DNA 


pBluescriptll SK- 


135 


187-5 l-0-0-f9-CS 


DNA 


pBluescriptll SK- 


136 


187-6-1-0-B9-CS 


DNA 


pBluescriptll SK- 


137 


187-6-4-0-C10-CS 


DNA 


pBluescriptll SK- 


138 


188-19-2-0-C8-CS 


DNA 


pBluescriptll SK- 


139 


188-22-4-0-G6-CS 


DNA 


pBluescriptll SK- 


140 


188-28-4-0-D11-CS 


DNA 


pBluescriptll SK- 
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141 


1 88-29-1 -0-E10-CS 


DNA 


pBluescriptll SK- 


142 


188-34-4-0-E5-CS 


DNA 


pBluescriptll SK- 


143 


188-9-3-0-A5-CS 


DNA 


pBluescriptll SK- 


144 


1 05-02 1-3-0-C3-CS 


DNA 


pBluescriptll SK- ] 


145 


105-037-4-0-H12-CS 


DNA 


pBluescriptll SK- 


146 


105-073-2-0-A7-CS 


DNA 


pBluescriptll SK- i 


147 


109-002-4-0-C6-CS 


DNA 


pBluescriptll SK- S 


148 


109-003-1 -0-G4-CS 


DNA 


pBluescriptll SK- 


149 


116-118-4-0-A8-CS 


DNA 


pBluescriptll SK- ! 


150 


145-52-2-0-D12-CS 


DNA 


pBluescriptll SK- 


151 


145-7-2-0-G5-CS 


DNA 


pBluescriptll SK- 


152 


145-7-3-0-D3-CS 


DNA 


pBluescriptll SK- 


153 


157-17-2-0-C1-CS 


DNA 


pBluescriptll SK- 


154 


160-101-3-0-H2-CS 


DNA 


pBluescriptll SK- 


155 


160-12-1-0-DlO-CS 


DNA 


pBluescriptll SK- 


156 


160-28-4-0-C4-CS 


DNA 


pBluescriptll SK- 


157 


160-31-3-0-E4-CS 


DNA 


pBluescriptll SK- 


158 


1 60-40- 1-0-H4-CS 


DNA 


pBluescriptll SK- 


159 


1 60-54- 1-0-F7-CS 


DNA 


pBluescriptll SK- 


160 


160-88-3-0-A8-CS.cor 


DNA 


pBluescriptll SK- 


161 


160-88-3-0-A8-CS.fr 


DNA 


pBluescriptll SK- 


162 


160-99-4-0-E4-CS 


DNA 


pBluescriptll SK- 


163 


161-5-4-0-B6-CS 


DNA 


pBluescriptll SK- 


164 


174-17-1-0-D6-CS 


DNA 


pPT j 


165 


174-32-4-0-F8-CS 


DNA 


pPT 


166 


174-38-4-0-D11-CS 


DNA 


pPT ; 


167 


174-8-2-0-C10-CS 


DNA 


pPT 


168 


179-14-2-0-F11-CS 


DNA 


pBluescriptll SK- 


169 


179-9-4-0-B8-CS 


DNA 


pBluescriptll SK- 


170 


181-10-1-0-C9-CS 


DNA 


pBluescriptll SK- 


171 


187-5-3-0-C7-CS 


DNA 


pBluescriptll SK- 


172 


188-26-4-0-F5-CS 


DNA 


pBluescriptll SK- 


173 


188-27-3-0-G1-CS 


DNA 


pBluescriptll SK- 


174 


188-29-2-0-H1-CS 


DNA 


pBluescriptll SK- 


175 


188-31-1-0-E6-CS 


DNA 


pBluescriptll SK- 


176 


1 88-45-1 -0-D3-CS 


DNA 


pBluescriptll SK- ; 


177 


188-5-1-0-H6-CS 


DNA 


pBluescriptll SK- 


178 


188-9-1-0-ClO-CS 


DNA 


pBluescnptll SK- 


179 


105-016-3-0-C5-CS 


DNA 


pBluescnptll SK- 


180 


105-026-4-0-D9-CS 


DNA 


pBluescriptll SK- 


181 


105-053-2-0-D9-CS 


DNA 


pBluescriptll SK- 


182 


105-069-3-0-A11-CS 


DNA 


pBluescriptll SK- 


183 


105-076-4-0-F6-CS 


DNA 


pBluescriptll SK- 


184 


105-135-2-0-F9-CS 


DNA 


pBluescriptll SK- 


185 


106-023-4-0-F6-CS 


DNA 


pBluescriptll SK- 


186 


110-001-3-0-Cll-CS 


DNA 


pBluescriptll SK- 


187 


110-002-3-0-F9-CS 


DNA 


pBluescriptll SK- 


188 


114-019-3-0-D9-CS 


DNA 


pBluescriptll SK- 
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189 


1 14-029- 1-0-C6-CS 


DNA 


pBluescriptll SK- 


190 


114-032-4-0-Bl-CS 


DNA 


pBluescriptn SK- | 


191 


114-070-2-0-H4-CS 


DNA 


pBluescriptll SK- 


192 


1 1 6-01 6-3 -0-F11-CS 


DNA 


pBluescriptll SK- 


193 


116-022-4-0-G2-CS 


DNA 


DBluescriDtll SK- 


194 


11 6-052-2 -0-H8-CS 


DNA 


nBluescriDtll SK- 

L/X/llXv JVl 1L/IX1 w J. v 


195 


116-053-4-0-B4-CS 

A AV V/ -J m*r r Vy 


DNA 


DBluescrintll SK- 

L/AJlUVOvl 1L/IA1 Ly A^_ 


196 


11 6-094-3 -0-H2-CS 


DNA 


nBluescrintll SK- \ 


197 


116-112-4-0-C7-CS 

A A Vy J. A X- T V/ v^ / V_yU 


DNA 


nBluescrintTT SK- 

Uly lUvOvl Iplll Ulv 


198 


116-123-3-0-F12-CS 

A A \J A w ~J _/ Vy A A VyLJ 


DNA 


oBluescrintll SK- 

L/JJ J. Uvuvl lyJ IAA Ulv 


199 


123-008-1-0-C5-CS 

A ~) \J\J\J A Vy V»y»y V-'Ly 


DNA 


nBluescrintTT SK- ' 


200 


145-53-2-0-H8-CS 

A i .y »y Z-i \y IJIU VyU 


DNA 


nBluescrintTT SK- 


201 


145-57-2 -0-C9-CS cor 


DNA 


nBluescrintll SK- ' 


202 


1 45-57-2 -0-C9-CS fr 


DNA 


nBluescrintTT SK- 


203 


145_7_3_0-B12-CS 

A r ^y I mJ \J Ay A a* v_y U 


DNA 


nBluescrintll SK- 

Ulyl UvOvl 1 VJ 111 Ulv 


204 


157-12-2-0-D1-CS 


DNA 


nBluescrintll SK- 

L/XyiMVOVl IL/tXX 


205 


157-16-2-0-D5-CS 

Aw'/ 1 V X- Vy A_y* *-J Vy Ly 


DNA 


nBluescrintTT SK- 

UAJlUvOvl IlJlll U1V 


206 


157-18-2-0-A7-CS 

l *y / x \j \y xx/ ky 


DNA 


nBluescrintll SK- 

L/AyiUVL}Vllpl.Al UAV 


207 


160-1 03- 1-0-BlO-CS 


DNA 


nBluescrintll SK- 


208 


160-104-4-0-F3-CS 

x w x \/ r r \s i v_/ ky 


DNA 


nBluescrintll SK- 


209 


1 60-22-2 -0-D10-CS 

x w X-* v/ xy x vy v-^ky 


DNA 


nBluescrintll SK- 


210 


160-24-3-0-F12-CS 


DNA 


nBluescrintIT SK- 

UlJlUvOwl lUtll Ly IV 


211 


1 60-3-2 -0-H3-CS 


DNA 


pBluescriptll SK- 


212 


1 60-58-2 -0-A2-CS 

A v v/ *y vy j—- \j i v vy L^y 


DNA 


nBluescrintTT SK- 

L/AylUvkJvl 1UIAA Ulv 


213 


160-73-1-0-B4-CS 

A Vy Vy / 1 v/ U~ v>U 


DNA 

l_y 1 ^ ix. 


nBluescrintTT vSK- 

|y LJ lLlv^jv^l IJ^ytAl ul\ 


214 


160-75-4-0-E6-CS 

1 Vyly / *J T V A_yVJ v_xU 


DNA 


nBluescrintTT SK- 

piJ lUlV^OVyi 1L/LAA kylv 


215 


160-97-3-0-F9-CS 


DNA 


nBluescrintTT SK- 


216 


174-1-4-0-E9-CS 


DNA 


nPT 
pri 


217 


174-12-4-0-C2-CS 


DNA 


nPT 
pr i 


218 


180-19-4-0-H2-CS 

1 Ly Vy a.J7~"t V/ 11/- V^L> 


DNA 

-L/l> AV 


nBluescrintTT SK- 

Ly AJ I l^J LAX aJV" 


219 


181-10-4-0-G12-CS 

1 U 1 1 Vy 1 \J VJ 1 ^ V_y k_J 


DNA 


nBhiescrintTT SK- 

UL/lUvOvl ILyUll kJlv 


220 


181-3-2-0-F6-CS 

I U 1 J Z. Vy A Vy 


DNA 


nBluescrintTT SK- i 

pUJ LiVvOVs/1 1 vJ LlX Jlv 


221 


181-4-4-0-A12-CS 

1 O I ~ ~ Vy iv 1 Z, Vy 


DNA 


nBluescrintTT SK- 


222 


181-9-2-0-F12-CS cor 


DNA 


nBluescrintTT SK- 

Ly LJ ILAVOVyl ILy IAA ul\ 


223 


181-9-2-0-F12-CS fr 

l u i y j_. vy i i vyky «ai 


DNA 


nBluescrintTT SK- 

L7±J 1 LAV/ OVA ILy tJLX Ulv 


224 


184-13-3-0-E11-CS 

A Vy i 1 — ' -y V/ A_» A A V^y U 


DNA 


nBluescrintIT SK- 

pUlUvovl Iplll Ulv 


225 


184-4-2-0-D3-CS 


DNA 


nBluescrintll SK- 

L/XyllAVtJvl 1L/IXX Ulv 


226 


1 84-7-1 -0-E7-CS 


DNA 


nBluescrintIT SK- 

Ly.Ui UVOul 1 L/Lll Ulv 


227 


184-8-4-0-G9-CS 


DNA 


nBluescrintll SK- 

piy lUvjwi ipni u iv 


228 


187-10-3-0-G9-CS 

Aw/ a vy «y V/ vj y vy u 


DNA 


nBluescrintIT SK- 

piyiu^ovi ipill Ulv 


229 


1 87-32-0-0-m20-CS 

a vy / — ' vy 1 1 ix> vy vy u 


DNA 


nBluescrintll SK- 

L/U 1 U v D vl 1 p 111 Ulv 


230 


187-32-0-0-n21-CS cor 

1 Vy / V/ V/ 1 1 A Vy U .vwl 


DNA 


nBluescrintTT SK- 

piJIUvuLl 1 ly LA A Ulv 


231 


187-32-0-0-n21-CS.fr 


DNA 


pBluescriptll SK- 


232 


187-4-2-0-E6-CS 


DNA 


pBluescriptll SK- 


233 


187-40-0-0-il5-CS 


DNA 


pBluescriptll SK- 


234 


187-47-0-0-g24-CS 


DNA 


pBluescriptll SK- 


235 


187-9-3-0-A2-CS 


DNA 


pBluescriptll SK- 


236 


188-26-4-0-H1-CS 


DNA 


pBluescriptll SK- 
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237 


188-35-3-0-G9-CS 


DNA 


pBluescriptll SK- 


238 


188-38-4-0-D8-CS 


DNA 


pBluescriptll SK- 


239 


188-41-1-0-E6-CS 


DNA 


pBluescriptll SK- 


240 


188-42-2-0-F3-CS.cor 


DNA 


pBluescriptll SK- 


241 


188-42-2-0-F3-CS.fr 


DNA 


pBluescriptll SK- 


242 


119-003-4-0-C2-CS 


PRT 


pBluescriptll SK- 


243 


105-01 6-1 -0-D3-CS 


PRT 


pBluescriptll SK- 


244 


105-016-3-0-G10-CS 


PRT 


pBluescriptll SK- 


245 


105-026-1 -0-A5-CS 


PRT 


pBluescriptll SK- 


246 


105-03 1-1-0-All-CS 


PRT 


pBluescriptll SK- 


247 


1 05-03 1-2-0-D3-CS 


PRT 


pBluescriptll SK- 


248 


105-035-2-0-C6-CS 


PRT 


pBluescriptll SK- 


249 


105-037-2-0-H11-CS 


PRT 


pBluescriptll SK- 


250 


105-053-4-0-E8-CS 


PRT 


pBluescriptll SK- 


251 


105-074-3-0-H10-CS 


PRT 


pBluescriptll SK- 


252 


105-089-3-0-G10-CS 


PRT 


pBluescriptll SK- 


253 


105-095-2-0-G11-CS 


PRT 


pBluescriptll SK- 


254 


106-006-1-0-E3-CS 


PRT 


pBluescriptll SK- 


255 


106-037-l-0-E9-CS.cor 


PRT 


pBluescriptll SK- 


256 


106-037-l-0-E9-CS.fr 


PRT 


pBluescriptll SK- 


257 


106-043-4-0-H3-CS 


PRT 


pBluescriptll SK- 


258 


11 0-007- 1-0-C7-CS 


PRT 


pBluescriptll SK- 


259 


114-016-1-0-H8-CS 


PRT 


pBluescriptll SK- 


260 


116-004-3-0-A6-CS 


PRT 


pBluescriptll SK- 


261 


116-054-3-0-E6-CS 


PRT 


pBluescriptll SK- 


262 


116-055-1-0-A3-CS 


PRT 


pBluescriptll SK- 


263 


116-055-2-0-F7-CS 


PRT 


pBluescriptll SK- 


264 


116-088-4-0-A9-CS 


PRT 


pBluescriptll SK- 


265 


1 16-09 1-1-0-D9-CS 


PRT 


pBluescriptll SK- 


266 


116-110-2-0-F4-CS 


PRT 


pBluescriptll SK- 


267 


116-11 1-1 -0-H9-CS 


PRT 


pBluescriptll SK- 


268 


116-1 11 -4-0-B3-CS 


PRT 


pBluescriptll SK- 


269 


116-115-2-0-F8-CS 


PRT 


pBluescriptll SK- 


270 


116-11 9-3 -0-H5-CS 


PRT 


pBluescriptll SK- 


271 


117-001-5-0-G3-CS 


PRT 


pBluescriptll SK- 


272 


145-25-3-0-B4-CS.cor 


PRT 


pBluescriptll SK- 


273 


145-25-3-0-B4-CS.fr 


PRT 


pBluescriptll SK- 


274 


145-56-3-0-D5-CS 


PRT 


pBluescriptll SK- 


275 


145-59-2-0-A7-CS 


PRT 


pBluescriptll SK- 


276 


157-15-4-O-Bll-CS 


PRT 


pBluescriptll SK- 


277 


160-103-1-0-Fll-CS 


PRT 


pBluescriptll SK- 


278 


160-37-2-0-H7-CS 


PRT 


pBluescriptll SK- 


279 


160-58-3-0-H3-CS 


PRT 


pBluescriptll SK- j 


280 


160-75-4-0-A9-CS 


PRT 


pBluescriptll SK- \ 


281 


174-10-2-0-F8-CS 


PRT 


pPT 


282 


174-33-3-0-F6-CS 


PRT 


pPT 


283 


174-38-1-0-B6-CS 


PRT 


pPT 


284 


174-38-3-0-C9-CS 


PRT 


pPT 
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285 


174-3 9-2-0- A3 -CS 


PRT 


pPT 

px X 


286 


174-41 -1-0 -A6-CS 


PRT 


pPT 


287 


174-5-3-0-H7-CS 


PRT 


pPT 


288 


174-7-4-0-H1-CS 


PRT 


t>PT 


289 


175-l-3-0-E5-CS.cor 


PRT 


pPT 


290 


175-1-3-0-E5-CS fr 


PRT 


pPT 


291 


180-19-4-0-F4-CS 


PRT 


nBluescrintTT SK- 

LfJJIUVOVi'l 1U UXX Ul\ 


292 


181-10-1-0-DlO-CS 

1 VJ 1 X Vy X V/ X_X X \y V^Ly 


PRT 


nBluescrintTT SK- 


293 


181-16-1-0-G7-CS 


PRT 


nBluescrintTT SK- 

L/X-Sl UVklVl I Ly ill kyiv 


294 


181-1 6-2-0- A7-CS 

X KJ X x vy \J L X / V>LJ 


PRT 


nBluescrintTT SK- 

UJJ IUvjVI ILy 111 U1Y 


295 


1 81-20-3 -0-B5-CS 

x v x x. vy _/ v/ u »y v^ kj 


PRT 


nBluescrintTT SK- 

uiyiuL/ jci iiyLii lyxv 


296 


181-3-3-0-B8-CS 


PRT 


nBluescrintTT SK- 

UlJlUViJvV 1UL11 Ul\ 


297 


18 1-3-3 -0-C9-CS 


PRT 


nBluescrintTT SK- 

uiyiuvoviiL/ui uiv 


298 


182-1-2-0-D12-CS 

X V/ X« X ^ V/ J—' X w vu 


PRT 


nBluescrintTT SK- 

Ly iy i uv jvi i Ly ui w iv 


299 


184_i -4-0-C11-CS 

x VJ i x r vy x i V/'Ly 


PRT 


nBluescrintTT SK- 

Lyx_^i uvovi i iy lxx uiv 


300 


184-4-1-0-All-CS 

x vy r ■ x v/ /xxx v^vy 


PRT 


nBluescrintTT SK- 

1 U V O vl 1 U 111 Ulv 


301 


187-12-4-0-A8-CS 


PRT 


nBluescrintTT SK- 

UiJ i uv. ol^i i iy ixx uiv 


302 


187-2-2-0-A3-CS 


PRT 


nBluescrintIT SK- 1 


303 


187-31-0-0-A2-CS 

1U 1 —/X V/ \J X X ^ 


PRT 


nBluescrintTT SK- 

Lyx_yj i viv^ uvi 1 L/ lxx uiv 


304 


187-34-0-0-112-CS 


PRT 


nBluescrintll SK- 

yju i uvavi x iy lxx ux>> 


305 


187-37-0-0-clO-CS 


PRT 


nBluescrintll SK- 


306 


187-3 8-0-0-1 10-CS 

1U / — ' V/ V/ V/ I 1 U V-"ky 


PRT 


nBluescrintTT SK- 

lyXJ 1 LXV/OV/l 1 Ly LXX LJXV 


307 


187-39-0-0-kl2-CS 


PRT 


nBluescrintTT SK- 

Ly iy> i lx v. ovv i itytxi ijiv 


308 


187-41-0-0-i21-CS 

X \J 1 r X vy \J iz» i. V/U 


PRT 


nBluescrintll SK- 

Ly u i xx V/ ovi iu in ky iv 


309 


188-1 1-1 -0-B3-CS 

1 UU XX XV/ l_y_/ k_7 


PRT 


nBluescrintTT SK- 

LyXJ 1 VX V Ovl 1 iy LXX LJ1V 


310 


188-18-4-0-A9-CS 


PRT 

X IV X 


nBluescrintTT SK- 

L-/UlUVvOVyl 1|JIXX Ulv 


311 


188-28-4-0-B12-CS cor 


PRT 

X XX. X 


nBluescrintTT SK- 

UxyiUv/Ovl ILyLll ulv 


312 


188-28-4-0-B12-CS fr 


PRT 


nBluescrintTT SK- 

UJJ 1 LXV»< Owl IL/LXX kJ'XV 


313 


1 88-28-4-0-D4-CS 


PRT 

X XV X 


nBluescrintTT SK- 

L/IJIUL/Ovl 1 LP IXX Ulv 


314 


1 88-41 -1-0-B8-CS cor 

I OO ■ I 1 Vy JJO Ly . vv vyl 


PRT 

X XV X 


nBluescrintTT SK- 

LIUlUVvOVl 1LFIXX Olv 


315 


1 88-41 -1-0-B8-CS fr 

1 OO— *T 1 1 V/ DO VvU.il 


PRT 

X XV X 


nRlnesrrintTT SK- 

|JXJ 1 U.Vv oV>l 1L»LXX DXV- 


316 


1 88-45-1 -0-D9-CS 

x oo"J i vy Ly j v-xkj 


PRT 


nBluescrintTT SK- 

LyXJ) 1 LXVvO v/1 lyJ IXX Olv 


317 


1 88-9-2 -0-F1-CS 

x oo y vy jji vo 


PRT 


nBluescrintTT SK- 

LyXJ ) lllVyoV>l ILflAJL LJ1V. 


318 


105-079-3-0-A11-CS 


PRT 


nBluescrintTT SK- 

uiy i lxv/ ovi l Ly lxx uiv 


319 


1 05-092- 1 -0-H7-CS 

x \j \j s t-i x vy xx/ vu 


PRT 


nBluescrintTT SK- 

L^XJl UV/JL/l 1 Ly LXX Ulv 


320 


105-14 1-4-0-H9-CS 

x \j x i l r v/ 11/ Ly 


PRT 


nBluescrintTT SK- 

lyxy luvovi luui iyiv 


321 


109-01 3-1 -0-B9-CS 

x\j y vy x _/ x v/ x_y y v_^lj 


PRT 


nBluescrintTT SK- 

lyiyi uv jvi iutii L/iv 


322 


110-008-4-0-D9-CS 

x x vy vy \y vy r vy iy ^ v^Ly 


PRT 


nBluescrintTT SK- 

piyiuvuviiuiii Lyxv 


323 


114-001-3-0-A2-CS 


PRT 


nBluescrintTT SK- 


324 


1 14-028-2 -0-C1-CS 


PRT 


nBluescrintTT SK- 

uiyiuvjui ipiii uiv 


325 


1 14-032-1 -0-H 10-CS 

x x~ \j ~j x vy xxxv/ v_x Ly 


PRT 


nBluescrintTT SK- 

UUlUV/OVl 1LJLXX Ulv 


326 


1 14-043 -2-0- A 10-CS 

A 1 i Vy i ~J \J xx. X Vy V/ky 


PRT 


nBluescrintTT SK- 

Uiy 1 UVvovl 1 Ly LXX Lyxv 


327 


11 4-044- 1-0-C5-CS 


PRT 


pBluescriptll SK- 


328 


116-003-3-0-D10-CS 


PRT 


pBluescriptll SK- 


329 


116-003-3-0-G12-CS 


PRT 


pBluescriptll SK- 


330 


1 16-01 1-2-0-F11-CS 


PRT 


pBluescriptll SK- 


331 


116-033-3-0-E4-CS 


PRT 


pBluescriptll SK- 


332 


H6-041-4-0-B6-CS 


PRT 


pBluescriptll SK- 
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333 


11 6-044-2 -0-C4-CS 


PRT 


pBluescriptll SK- 


334 


1 16-075-1 -0-E6-CS 


PRT 


pBluescriptll SK- 


335 


116-094-4-0-G5-CS 


PRT 


pBluescriptll SK- 


336 


117-005-3-0-F2-CS 


PRT 


pBluescriptll SK- 


337 


121 -007-3 -0-D9-CS 


PRT 


pBluescriptll SK- 


338 


145-91 -3 -0-D10-CS 


PRT 


DBluescriDtll SK- 


339 


157-17-1-0-F4-CS 


PRT 


DBluescriDtTI SK- 


340 


160-1 1-3-0-G8-CS 


PRT 


pBluescriptll SK- i 


341 


160-24-1 -0-F12-CS 


PRT 


DBluescriDtTI SK- 


342 


1 60-24-2 -0-E9-CS 

jl v/v/ r i_ * \j x — 4 _s v/fcj 


PRT 


DBluescriDtll SK- 


343 


160-25 -4-0 -D2-CS 

x v/v/ x** *j r v/ is 


PRT 


DBluescrintTT SK- 

U 1 U V JVl 1 p 111 Ul\ 


344 


160-31-3-0-A11-CS 

X V/V/ -J X •w' V/ L V 1 A V^U 


PRT 


DBluescriDtTI SK- 

VJ VJ \ VAV OV 1 A J-' vJLl Ulv 


345 


160-32-1 -0-F6-CS 

A V/V/ *J X V/ X V/ V^k-F 


PRT 


DBluescriDtll SK- 

U/lv I UV JVl 11/ Ul UIV 


346 


160-37-1 -0-A3-CS 

A V/V/ *J t X V/ * V— / Vw>< k_/ 


PRT 


DBluescriDtll SK- 

U JJ 1 U v J vl 1 U 111 UIV j 


347 


1 60-40-3 -0-E9-CS 

X V/V/ TV/ — ^ V/ X_i ^**j/\*J 


PRT 


DBluescriDtTI SK- 


348 


160-58-3-0-E4-CS 

x v/ v/ *-/ v/ v/ J—/ r l^j 


PRT 


DBluescriDtll SK- 

pl/IUVOvlipill UIV 


349 


160-85 -3 -0-D4-CS 


PRT 


DBluescriDtll SK- 1 

l/X/ 1 Vi V kj VI 1 1/ 111 k»/l». 


350 


160-95-3-0-A11-CS 


PRT 


pBluescriptll SK- 


351 


1 62- 1 0-4-0-F9-CS .cor 


PRT 


pBluescriptll SK- 


352 


162-10-4-0-F9-CS.fr 


PRT 


DBluescriDtll SK- 

L/X—J X V*Vx JVl X VJ 111 k/ X V. 


353 


174-13-2-0-E4-CS 


PRT 


dPT 


354 


1 74-46-2 -0-B11-CS 

X f 1 TV/ jL* V/ X_^ X X V_v KJ 


PRT 


dPT 

pri 


355 


1 79-8-2 -0-A6-CS 

X / *f V/ V/ x IV/ V^ L_/ 


PRT 


DBluescriDtll SK- 


356 


180-22-3-0-B6-CS 


PRT 


DBluescriDtll SK- 1 

UJJiuvJvl 1 yj 111 k*/i^. 


357 


181-1 3-1 -0-F7-CS 

X V/ X X — / X V/ X. / V>k/ 


PRT 


DBluescriDtll SK- 

uu i uv o vi i y ui J iv 


358 


181-1 5-4-0-F7-CS 


PRT 


DBluescriDtTI SK- 

LFL> 1 Uv ovl 11/ 111 Ul\ 


359 


1 81-20- 1-0-G7-CS 


PRT 


DBluescriDtTI SK- 

l/U lUVOvlXU 111 uiv 


360 


184-15-3-0-D1-CS 


PRT 


nRluescriDtTT SK- 

UJJIUWOvl 1L/L1J. UiV 


361 


1 87-12-2 -0-G11-CS 


PRT 


DBIuescriDtTT SK- 


362 


187-2-2-0-A12-CS 


PRT 


TYRIne^rrmtTT KK- 

JJ XJ 1 Ov 1 1JJ ixx Uiv 


363 


187-30-0-0-k23-CS 


PRT 


DBluescriDtTI SK- 

UL/lUvovl 1 VJ \.xx ulv 


364 


187-36-0-0-el9-CS 

1 U / _/ V/ \J \J V X y VykJ 


PRT 


DBluescriDtTI SK- 

UU 1 UvOvl 11/ 111 Ulv 


365 


187-38-0-0-d22-CS 


PRT 


DBIuescriDtTT SK- 

Ul/lUvovllUUl Ulv 


366 


187-39-0-0-b9-CS 


PRT 


DBluescriDtll SK- i 

\jyjx Uvovi l vj lii uiv 


367 


187-39-0-0-e6-CS 


PRT 


DBluescriDtll SK- 

pi/ 1 uv ovi lyui uiv 


368 


1 87-45-0-0-11 8-CS 

x vj / r *./ v/ v/ xxv/ V-/ v_j 


PRT 


DBluescriDtll SK- 

xJ ■ 9 1 VX\v Ov X 1 VJ 111 V-/ 1^. 


369 


187-45-0-0-m21-CS 

x v/ / r *y v/ v/ 1 1 1 x k^j 


PRT 


DBluescriDtll SK- 

pi/iuv/jvt ipm uiv 


370 


187-45-0-0-n8-CS 

X V/ / 1 V/ V/ XXV/ k^/ 


PRT 


DBluescriDtll SK- 


371 


187-46-0-0-f23-CS 

X V/ / TV/ V/ V/ X V—/ k/ 


PRT 


DBluescriDtll SK- 

L/i/lUvOvl 1 V/ til U1Y 


372 


187-5-1-0-A12-CS 

X V/ / *-/ X V/ ill L_J 


PRT 


DBluescriDtll SK- 

ui/iuvjviium uiv 


373 


187-5-1-0-F6-CS 

x v/ / «h/ x v/ x. v/ Vy u 


PRT 


DBluescriDtll SK- 

pjjiuvjvi ipm uiv 


374 


187-5-2-0-B2-CS 

X V/ / «/ V/ X/m V^ 


PRT 


DBluescriDtTI SK- 

pi/iuvovi ipm UIV 


375 


187-5-3-0-D5-CS 


PRT 


pBluescriptll SK- 


376 


187-51-0-0-f9-CS 


PRT 


pBluescriptll SK- 


377 


1 87-6-1 -0-B9-CS 


PRT 


pBluescriptll SK- 


378 


187-6-4-0-C10-CS 


PRT 


pBluescriptll SK- 


379 


188-19-2-0-C8-CS 


PRT 


pBluescriptll SK- 


380 


188-22-4-0-G6-CS 


PRT 


pBluescriptll SK- 
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381 


188-28-4-0-D11-CS 


PRT 


pBluescnptll SK- 


382 


188-29-1-0-ElO-CS 


PRT 


pBluescriptll SK- 


383 


188-34-4-0-E5-CS 


PRT 


pBluescriptll SK- 


384 


188-9-3-0-A5-CS 


PRT 


pBluescriptll SK- 


385 


105-021-3-0-C3-CS 


PRT 


pBluescriptll SK- 


386 


105-037-4-0-H12-CS 


PRT 


pBluescriptll SK- 


387 


105-073-2-0-A7-CS 


PRT 


oBluescriotll SK- 


388 


109-002-4-0-C6-CS 


PRT 


pBluescnptll SK- 


389 


109-003-1 -0-G4-CS 


PRT 


pBluescriptll SK- 


390 


116-118-4-0-A8-CS 


PRT 


DBluescriDtll SK- 


391 


145-52-2-0-D12-CS 


PRT 


DBluescriotll SK- 

pA^lUV/Ovl 1L/IA1 U1V 


392 


145-7-2-0-G5-CS 


PRT 


DBluescriDtll SK- 


393 


145-7-3-0-D3-CS 


PRT 


DBluescriotll SK- 


394 


157-17-2-0-C1-CS 


PRT 


DBluescrintTI SK- 

pA* 1 UV Ovl 1 U 111 L»1V 


395 


160-101 -3 -0-H2-CS 


PRT 


pBluescriptll SK- 


396 


160-12-1-0-DlO-CS 


PRT 


pBluescriptll SK- 


397 


160-28-4-0-C4-CS 


PRT 


pBluescriptll SK- 


398 


160-31 -3 -0-E4-CS 


PRT 


pBluescriptll SK- 


399 


1 60-40- 1-0-H4-CS 


PRT 


pBluescriptll SK- 


400 


160-54-1-0-F7-CS 


PRT 


pBluescriptll SK- 


401 


160-88-3-0-A8-CS cor 


PRT 


pBIuescriptll SK- 


402 


160-88-3-0-A8-CS.fr 


PRT 


pBluescriptll SK- 


403 


160-99-4-0-E4-CS 

X \J\J ^ f \J jI — ' r V^/ L^J 


PRT 


nBluescriDtll SK- 


404 


161-5-4-0-B6-CS 


PRT 


oBluescriDtll SK- 

l/X/lUv JVl 1ULXX L/XV 


405 


174-17-1-0-D6-CS 

x / r x f x \/ \j \*svj> 


PRT 


dPT 


406 


174-32-4-0-F8-CS 


PRT 


oPT 


407 


174-38-4-0-D11-CS 


PRT 


dPT 


408 


174-8-2-0-C10-CS 


PRT 


dPT 

yr i 


409 


179-14-2-0-F11-CS 


PRT 


tiBluescrintTT SK- 

VJXJ 1 UVuvl 1 VJ LAX kJlv 


410 


179-9-4-0-B8-CS 


PRT 


DBluescrintTI SK- 

pL'lU^JVl 1 VJ LA A UA^ 


411 


1 81-10-1 -0-C9-CS 


PRT 


nBluescrintll SK- 


412 


187-5-3-0-C7-CS 


PRT 


DBluescriDtll SK- 

I/Xj X Viv Jvl 1 VJ IXX L/XV 


413 


188-26-4-0-F5-CS 


PRT 


DBluescriDtll SK- 

VJ XJ Jvl IVJ 11 A A^. 


414 


188-27-3-0-G1-CS 


PRT 


DBluescriDtll SK- 


415 


1 88-29-2 -0-H1-CS 

X V? VJ <w ^/ Jm* \J XXX l*J 


PRT 


DBluescriDtll SK- 

L/X^lUv Jvl 1 L/ 111 V^-J X ^ 


416 


188-31-1-0-E6-CS 

X VJ VJ XXV X_-/\-J V-^V_*P 


PRT 


pBluescriptll SK- 


417 


1 88-45-1 -0-D3-CS 


PRT 


pBluescriptll SK- 


418 


1 88-5-1 -0-H6-CS 


PRT 


pBluescriptll SK- ! 


419 


188-9-1-0-ClO-CS 

X VJ VJ S X \J \>mS X \J l*J 


PRT 


DBluescriotll SK- 

L/Xy IUV JVl 1U IXX k~/Xm* 


420 


105-016-3-0-C5-CS 


PRT 


DBluescriDtll SK- 

L/X-/ X UvOvl 1 VJ IXX v^J X ^ 


421 


105-026-4-0-D9-CS 


PRT 


DBluescriDtll SK- 

L/X/tUVJVl 1 VJ 111 LTlv 


422 


105-053-2-0-D9-CS 


PRT 


DBluescriDtll SK- 

L/JJlUvJvLlL/lll UlV 


423 


105-069-3-0-A11-CS 


PRT 


pBluescriptll SK- 


424 


105-076-4-0-F6-CS 


PRT 


pBluescriptll SK- 


425 


105-1 35-2-0-F9-CS 


PRT 


pBluescriptll SK- 


426 


106-023-4-0-F6-CS 


PRT 


pBluescriptll SK- 


427 


110-001-3-0-Cll-CS 


PRT 


pBluescriptll SK- 


428 


110-002-3-0-F9-CS 


PRT 


pBluescriptll SK- 
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429 


114-019-3-0-D9-CS 


PRT 


pBluescriptll SK- 


430 


1 14-029-1 -0-C6-CS 


PRT 


pBluescriptll SK- 


431 


114-032-4-0-B1-CS 


PRT 


nBluescrintll SK- 


432 


114-070-2-0-H4-CS 


PRT 


pBluescriptll SK- 


433 


1 1 6-01 6-3 -0-F11-CS 


PRT 


DBluescriDtll SK- 


434 


116-022-4-0-G2-CS 


PRT 


nBluescrintll SK- 


435 


116-052-2-0-H8-CS 

X 1 v v«/i^ Jm0 V/ XXV/ i^J 


PRT 


nBlue serin til SK- 


436 


116-053-4-0-B4-CS 


PRT 


nBluescrintIT SK- 


437 


11 6-094-3 -0-H2-CS 

X XV/ V/ ~r T _/ V/ X 1^ V-/ KJ 


PRT 


nBluescrintIT SK- 

VJU 1 UvOwl 1 Ly 11A Ul\ 


438 


116-112-4-0-C7-CS 

AAV/ l l 4— T \J / 


PRT 


nBluescrintTT SK- 

MXJ I HVw OVy L 1U LX1 kJAV 


439 


116-123-3-0-F12-CS 


PRT 


nBluescrintIT SK- 


440 


123-008-1-0-C5-CS 


PRT 


nRluescrintTT SK- 

pAJlUvO^l ILyLAl kJXV 


441 


145-53-2-0-H8-CS 


PRT 


nBluescrintIT SK- 


442 


145-57-2 -0-C9-CS cor 


PRT 


nBluescrintIT SK- 


443 


145-57-2 -0-C9-CS fr 


PRT 


nBluescrintIT SK- 


444 


145-7-3-0-B12-CS 


PRT 


nBluescrintIT SK- 


445 


157-12-2-0-D1-CS 


PRT 


nBluescrintIT SK- 

VJXJ 1 UvOvl A Ly LXX UIv 


446 


157-16-2-0-D5-CS 


PRT 


nBluescrintll SK- 

\J A_y A MvJvl 1 L/ 111 U A V. 


447 


157-1 8-2 -0-A7-CS 


PRT 


nBluescrintll SK- 

yj i—J I Uv uui i Ly til 


448 


160-103-1-0-BlO-CS 


PRT 


nBluescrintll SK- 

Uiy A Uvjvl A Ly IAA UAV 


449 


160-104-4-0-F3-CS 


PRT 


nBluescrintll SK- 


450 


160-22-2-0-D10-CS 


PRT 


nBluescrintll SK- 


451 


160-24-3-0-F12-CS 


PRT 


nBluescrintll SK- 

uiy iuwovi iiyiAA k_/ a 


452 


160-3-2-0-H3-CS 

X V/V/ ~J \J X X 


PRT 


nBluescrintIT SK- 

UiyiUCOvl 11JL11 LylV 


453 


1 60-58-2 -0-A2-CS 

AW «y KJ V Al f t V—'U 


PRT 


nBluescrintTT SK- 

UiJ 1 U V 1 U 111 Olv 


454 


1 60-73-1 -0-B4-CS 


PRT 


nBluescrintTT SK- 


455 


160-75-4-0-F6-CS 


PRT 

X XV X 


nRlne<;rrintTT SK- 


456 


1 60-97-3 -0-E9-CS 


PRT 


nRluescrintTT SK- 

piJlUL/OV/l lLyLXX Olv 


457 


174-1 -4-0-F9-CS 


PRT 

X XV X 


nPT 
pr i 


458 


174-12-4-0-C2-CS 


PRT 

X XV X 


nPT 
pri 


459 


180-19-4-0-H2-CS 


PRT 

X XV X 


nRluescrintTT SK- 

|JJ_*iU.Wo^l lLyLXX Olv 


460 


181-10-4-0-G12-CS 


PRT 

X XV X 


nRlnesrrintTT ^K- 

IJXJlLlLvDV/l 1LILXX kj> XV 


461 


181-3-2-0-F6-CS 

IDA w/^V/Av/ Vyl-J 


PRT 

X XV X 


nR1iie«;rrintTT ^K- 


462 


181-4-4-0-A12-CS 

i u i r r v/ £%. a vy Ly 


PRT 


nRluescrintTT SK- 

L/X-M Hv/OVvi ILyLXX L7XV 


463 


181-9-2-0-F12-CS cor 

X Vs X _X 4tm* \J X X -L* V~v V-J « Vv VrX 


PRT 


nRluescrintTT SK- 


464 


181-9-2-0-F12-CS fr 

X V/ X ^ \J X X J— 1 V — ✓ VJ • XX 


PRT 


nRluescrintll SK- 


465 


184-13-3-0-E11-CS 

X w T X _/ \J X-/ X X V>L/ 


PRT 


nRluescrintTT SK- 

L/Ay 1 WVOvl 1 Ly L A A kyA V 


466 


184-4-2-0-D3-CS 


PRT 


nBluescrintTT SK- 

UiylU^ovi iLy ixi Ulv 


467 


1 84-7-1 -0-E7-CS 


PRT 


nBluescrintll SK- 

Ly i_y iL*v^ov^i iiyLAA jl\ 


468 


184-8-4-0-G9-CS 

XV/ r vy r v/ \jy 


PRT 


nBluescrintTT SK- 

L/JJlUvgwl 1L/LXX Olv 


469 


187-10-3-0-G9-CS 


PRT 


nBluescrintTT SK- 

LylJ IUV/jvI ILyLlA Jl\ 


470 


1 87-32-0-0-m20-CS 

X KJ 1 vy Vy lit , \ / VyLJ 


PRT 


nRluescrintTT SK- 

L/Xy'lLXV'OVvl IL/LXX D IV 


471 


187-32-0-0-n21-CS.cor 


PRT 


pBluescriptll SK- 


472 


187-32-0-0-n21-CS.fr 


PRT 


pBluescriptll SK- j 


473 


187-4-2-0-E6-CS 


PRT 


pBluescnptll SK- 


474 


187-40-0-0-il5-CS 


PRT 


pBluescriptll SK- 


475 


187-47-0-0-g24-CS 


PRT 


pBluescriptll SK- 


476 


187-9-3-0-A2-CS 


PRT 


pBluescriptll SK- 
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477 


188-26-4-0-H1-CS 


PRT 


pBluescriptll SK- 


478 


188-35-3-0-G9-CS 


PRT 


pBluescriptll SK- 


479 


188-38-4-0-D8-CS 


PRT 


pBluescriptll SK- 


480 


188-41-1-0-E6-CS 


PRT 


pBluescriptll SK- 


481 


188-42-2-0-F3-CS.cor 


PRT 


pBluescriptll SK- 


482 


188-42-2-0-F3-CS.fr 


PRT 


pBluescriptll SK- 
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Table II 



Seq Id No 


Full coding 
sequence 


Signal 
sequence 


Coding sequence 
for mature 
protein 


Polyadenylation 
signal 


Polyadenylation 
site 


1 


[169-1692] 


[169-249] 


[250-1692] 


[2126-2131] 


[2152-2201] 


2 


[148-1140] 


[148-240] 


[241-1140] 


[1592-1597] 


[1615-1631] 


3 


[85-906] 


[85-135] 


[136-906] 


[1159-1164] 


[1184-1245] 


4 


[31-1248] 


[31-135] 


[136-1248] 


None detected 


[1607-1623] j 


5 


[72-143] 


[72-119] 


[120-143] 


[1416-1421] 


[1438-1454] 


6 


[111-1154] 


[111-197] 


[198-1154] 


[1602-1607] 


[1623-1639] j 


7 


[66-1256] 


[66-173] 


[174-1256] 


None detected 


[1752-1768] 


8 


[190-1398] 


[190-252] 


[253-1398] 


[1470-1475] 


[1494-1510] 


9 


[78-410] 


[78-155] 


[156-410] 


None detected 


[866-882] 


10 


[84-299] 


[84-134] 


[135-299] 


[1814-1819] 


[1833-1849] 


11 


[55-468] 


[55-99] 


[100-468] 


[531-536] 


[549-565] 


12 


[152-475] 


[152-244] 


[245-475] 


[1623-1628] 


[1647-1663] 


13 


[112-552] 


[112-183] 


[184-552] 


[706-711] 


[729-744] 


14 


[101-1243] 


[101-199] 


[200-1243] 


[1720-1725] 


[1745-1759] 


15 


[101-517] 


[101-199] 


[200-517] 


[1716-1721] 


[1741-1755] 


16 


[59-853] 


[59-100] 


[101-853] 


[894-899] 


[922-936] 


17 


[73-672] 


[73-132] 


[133-672] 


[689-694] 


[711-747] 


18 


[94-1275] 


[94-210] 


[211-1275] 


[1849-1854] 


[1870-1884] 


19 


[42-515] 


[42-92] 


[93-515] 


[649-654] 


[677-691] 


20 


[271-969] 


[271-366] 


[367-969] 


[1093-1098] 


[1124-1138] ! 


21 


[76-276] 


[76-135] 


[136-276] 


[436-441] 


[455-468] j 


22 


[6-287] 


[6-80] 


[81-287] 


[684-689] 


[706-720] | 


23 


[171-692] 


[171-227] 


[228-692] 


[691-696] 


[713-727] 


24 


[137-454] 


[137-187] 


[188-454] 


[440-445] 


[456-470] 


25 


[238-609] 


[238-291] 


[292-609] 


[948-953] 


[973-987] 


26 


[80-862] 


[80-127] 


[128-862] 


[875-880] 


[894-908] 


27 


[83-310] 


[83-157] 


[158-310] 


[725-730] 


[748-762] 


28 


[310-906] 


[310-357] 


[358-906] 


[1071-1076] 


[1088-1102] 


29 


[24-287] 


[24-131] 


[132-287] 


[405-410] 


[422-436] 


30 


[132-1574] 


[132-206] 


[207-1574] 


[1899-1904] 


[1923-1938] 


31 


[117-545] 


[117-245] 


[246-545] 


None detected 


[1100-1116] 


32 


[117-362] 


none detected 


[117-362] 


None detected 


[1098-1114] 


33 


[144-1262] 


[144-224] 


[225-1262] 


[2035-2040] 


[2056-2072] 


34 


[35-316] 


[35-109] 


[110-316] 


None detected 


[393-409] j 


35 


[177-767] 


[177-236] 


[237-767] 


None detected 


[822-836] j 


36 


[208-1239] 


[208-294] 


[295-1239] 


None detected 


[1307-1323] 


37 


[60-1682] 


[60-143] 


[144-1682] 


None detected 


[1929-1945] 
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[191-1468] 


[191-274] 


[275-1468] 


None detected 


[1741-1757] 


77 


[48-950] 


[48-107] 


[108-950] 


[1983-1988] 


[2011-2027] 


78 


[156-512] 


[156-206] 


[207-512] 


[1831-1836] 


[1864-1880] 


79 


[67-351] 


[67-183] 


[184-351] 


None detected 


[568-584] 
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80 


[259-831] 


[259-375] 


[376-831] 


None detected 


[1337-1351] 


81 


[111-377] 


[111-233] 


[234-377] 


[689-694] 


[706-720] 


82 


[223-432] 


[223-336] 


[337-432] 


[986-991] 


[1015-1029] 


83 


[769-1272] 


[769-843] 


[844-1272] 


None detected 


[1774-1788] 


84 


[30-527] 


[30-74] 


[75-527], 


[738-743] 


[756-805] 


85 


[39-506] 


[39-83] 


[84-506] 


None detected 


[800-814] 


86 


[115-429] 


[115-210] 


[211-429] 


[565-570] 


[584-598] 


87 


[332-574] 


[332-412] 


[413-574] 


None detected 


[630-699] 


88 


[133-417] 


[133-213] 


[214-417] 


[876-881] 


[891-905] 


89 


[113-364] 


[113-172] 


[173-364] 


None detected 


[500-514] 


90 


[9-380] 


[9-104] 


[105-380] 


[483-488] 


[504-518] 


91 


[155-340] 


[155-292] 


[293-340] 


[728-733] 


[754-808] 


92 


[185-634] 


[185-253] 


[254-634] 


[704-709] 


[723-737] 


93 


[53-646] 


[53-91] 


[92-646] 


[694-699] 


[714-728] 


94 


[247-510] 


[247-318] 


[319-510] 


[544-549] 


[568-582] 


95 


[143-592] 


[143-277] 


[278-592] 


[1877-1882] 


[1898-1913] 


96 


[33-458] 


[33-89] 


[90-458] 


[637-642] 


[654-670] 


97 


[1-336] 


[1-81] 


[82-336] 


[900-905] 


[923-939] 


98 


[174-443] 


[174-269] 


[270-443] 


[629-634] 


[647-661] 


99 


[282-521] 


[282-386] 


[387-521] 


[600-605] 


[631-647] 


100 


[251-643] 


[251-295] 


[296-643] 


None detected 


[990-1006] 


101 


[179-475] 


[179-295] 


[296-475] 


[995-1000] 


[1015-1059] 


102 


[34-327] 


[34-162] 


[163-327] 


[466-471] 


[498-514] 


103 


[303-953] 


[303-359] 


[360-953] 


[1124-1129] 


[1142-1158] 


104 


[97-645] 


[97-156] 


[157-645] 


[1524-1529] 


[1547-1563] 
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[80-820] 


[80-118] 


[119-820] 


[1587-1592] 


[1606-1621] 
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[77-388] 


[77-217] 


[218-388] 


[524-529] 


[541-557] j 


107 


[139-513] 


[139-201] 


[202-513] 


[566-571] 


[584-600] 


108 


[81-986] 


[81-134] 


[135-986] 


[1092-1097] 


[1113-1129] 


109 


[266-586] 


[266-307] 


[308-586] 


[745-750] 


[762-778] i 


110 


[59-745] 


[59-160] 


[161-745] 


None detected 


[1285-1301] 


111 


[59-676] 


[59-160] 


[161-676] 


None detected 


[1284-1300] 


112 


[15-278] 


[15-146] 


[147-278] 


[1580-1585] 


[1600-1617] 


113 


[167-619] 


[167-262] 


[263-619] 


[1598-1603] 


[1617-1634] 


114 


[223-417] 


[223-270] 


[271-417] 


[655-660] 


[677-693] 


115 


[166-732] 


[166-237] 


[238-732] 


[753-758] 


[768-784] 


116 


[75-623] 


[75-215] 


[216-623] 


[767-772] 


[788-804] 


117 


[30-335] 


[30-71] 


[72-335] 


[450-455] 


[468-484] 


118 


[21-752] 


[21-107] 


[108-752] 


None detected 


[970-985] 


119 


[185-715] 


[185-253] 


[254-715] 


[785-790] 


[814-839] 


120 


[54-527] 


[54-116] 


[117-527] 


[545-550] 


[567-583] 


121 


[129-686] 


[129-185] 


[186-686] 


[989-994] 


[1008-1024] J 
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122 


[165-614] 


r-i /•r i Arn 

[165-305] 


[306-614] 


r""7 i o —7 O /| T 

[719-724] 


[744-760] 


123 


[192-476] 


[192-326] 


[327-476] 


[555-560] 


mo c fi a~i 

[578-594] 


124 


[16-297] 


[16-93] 


[94-297] 


None detected 


[543-559] 


125 


[216-635] 


[216-335] 


[336-635] 


[717-722] 


r ^7/^ 0 "7/1/11 

[728-744] 


126 


[164-280] 


[164-268] 


[269-280] 


[789-794] 


[809-824] 


127 


[68-301] 


[68-190] 


[191-301] 


[485-490] 


[5 10-526] 


128 


[179-427] 


[179-298] 


[299-427] 


r r "~t/~\ c 0 At 

[579-584] 


[602-6 18] 


129 


[22-297] 


[22-66] 


[67-297] 


[742-747] 


[760-776] 


130 


[9-845] 


[9-134] 


[135-845] 


[964-969] 


[983-998] 


131 


[27-578] 


[27-119] 


[120-578] 


[742-747] 


[763-779] 


132 


[408-710] 


[408-533] 


[534-710] 


[985-990] 


[1009-1025] 


133 


[247-501] 


[247-306] 


[307-501] 


None detected 


[592-607] 


134 


[333-602] 


[333-416] 


[417-602] 


None detected 


[761-774] 


135 


[110-376] 


[110-208] 


[209-376] 


[582-587] 


[601-61 1] 


136 


[22-417] 


[22-66] 


[67-417] 


[888-893] 


[909-925] 


137 


[62-367] 


[62-103] 


[104-367] 


[638-643] 


[658-674] 


138 


[107-1618] 


[107-178] 


[179-1618] 


[1688-1693] 


[1709-1725] 


139 


[16-471] 


[16-93] 


[94-471] 


None detected 


[1458-1474] 


140 


[222-374] 


[222-299] 


[300-374] 


XT J x . 1 

None detected 


[637-653] 


141 


[59-274] 


[59-127] 


[128-274] 


[1452-1457] 


[1474-1490] 


142 


[158-442] 


[158-301] 


[302-442] 


[621-626] 


[645-661] 


143 


[5-454] 


[5-64] 


[65-454] 


[1745-1750] 


[1773-1789] 


144 


[241-1302] 


none detected 


[241-1302] 


[1968-1973] 


[1990-2006] 


145 


[15-635] 


none detected 


[15-635] 


[1057-1062] 


[1080-1096] 


146 


[109-738] 


none detected 


[109-738] 


[1633-1638] 


[1650-1666] 


147 


[21-1145] 


none detected 


[21-1145] 


[1648-1653] 


[1666-1687] 


148 


[70-1596] 


none detected 


[70-1596] 


[1712-1717] 


[1733-1747] 


149 


[129-362] 


none detected 


[129-362] 


[597-602] 


[626-658] 


150 


[109-594] 


none detected 


[109-594] 


[1999-2004] 


[2029-2045] 


151 


[150-587] 


none detected 


[150-587] 


None detected 


[772-788] 


152 


[173-847] 


none detected 


[173-847] 


[1894-1899] 


[1915-1931] 


153 


[100-441] 


none detected 


[100-441] 


[479-484] 


[500-514] 


154 


[32-1132] 


none detected 


[32-1132] 


None detected 


[1167-1183] 


155 


[160-996] 


none detected 


[160-996] 


[1504-1509] 


[1529-1545] 


156 


[11-529] 


none detected 


[11-529] 


[1042-1047] 


[1053-1068] 


157 


[135-749] 


none detected 


[135-749] 


[1055-1060] 


[1081-1097] 


158 


[98-637] 


none detected 


[98-637] 


[862-867] 


[878-894] 


159 


[221-670] 


none detected 


r^A 1 ✓"'7/\1 

[221-670] 


[669-674] 


[688-703] 


160 


[165-674] 


none detected 


[165-674] 


[808-813] 


[833-849] 


161 


[165-671] 


none detected 


[165-671] 


[805-810] 


[830-846] 


162 


[28-1128] 


none detected 


[28-1128] 


[1121-1126] 


[H59-H76] 


163 


[135-194] 


none detected 


[135-194] 


[1050-1055] 


[l 068-1084] 
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164 


[173-847] 


none detected 


[173-847] 


[1757-1762] 


[1776-1793] 


165 


[8-1141] 


none detected 


[8-1141] 


None detected 


[1832-1849] 


166 


[136-264] 


none detected 


[136-264] 


[1720-1725] 


[1731-1748] 


167 


[14-1048] 


none detected 


[14-1048] 


[1234-1239] 


[1258-1275] 


168 


[70-777] 


none detected 


[70-777] 


[987-992] 


[1007-1023] 


169 


[38-400] 


none detected 


[38-400] 


[1043-1048] 


[1069-1085] ! 


170 


[63-572] 


none detected 


[63-572] 


[750-755] 


[767-776] | 


171 


[160-867] 


none detected 


[160-867] 


[1178-1183] 


[1203-1219] 


172 


[68-640] 


none detected 


[68-640] 


None detected 


[1471-1487] 


173 


[132-1298] 


none detected 


[132-1298] 


[1873-1878] 


[1899-1915] i 


174 


[259-1701] 


none detected 


[259-1701] 


None detected 


[1974-1990] 


175 


[213-1274] 


none detected 


[213-1274] 


[1940-1945] 


[1955-1971] 


176 


[68-127] 


none detected 


[68-127] 


None detected 


[1597-1613] 


177 


[65-1024] 


none detected 


[65-1024] 


[1291-1296] 


[1315-1361] J 


178 


[109-585] 


none detected 


[109-585] 


[1059-1064] 


[1082-1113] 


179 


[29-577] 


none detected 


[29-577] 


[1917-1922] 


[1944-1960] | 


180 


[23-451] 


none detected 


[23-451] 


[1405-1410] 


[1427-1443] 


181 


[232-450] 


none detected 


[232-450] 


None detected 


[589-605] 


182 


[758-1183] 


none detected 


[758-1183] 


None detected 


[1708-1724] 


183 


[486-932] 


none detected 


[486-932] 


None detected 


[1670-1686] 


184 


[80-304] 


none detected 


[80-304] 


None detected 


[452-463] 


185 


[188-691] 


none detected 


[188-691] 


[707-712] 


[727-773] 


186 


[94-573] 


none detected 


[94-573] 


None detected 


[739-753] 


187 


[181-462] 


none detected 


[181-462] 


None detected 


[740-754] j 


188 


[6-290] 


none detected 


[6-290] 


None detected 


[971-998] 


189 


[115-411] 


none detected 


[115-411] 


[573-578] 


[591-605] 


190 


[3-368] 


none detected 


[3-368] 


[481-486] 


[511-526] 


191 


[174-527] 


none detected 


[174-527] 


[878-883] 


[896-910] 


192 


[57-203] 


none detected 


[57-203] 


[579-584] 


[599-668] 


193 


[68-334] 


none detected 


[68-334] 


[562-567] 


[583-637] 


194 


[183-443] 


none detected 


[183-443] 


[670-675] 


[692-706] 


195 


[94-228] 


none detected 


[94-228] 


None detected 


[656-670] 


196 


[133-327] 


none detected 


[133-327] 


[465-470] 


[496-510] 


197 


[22-357] 


none detected 


[22-357] 


None detected 


[486-500] 


198 


[4-333] 


none detected 


[4-333] 


[633-638] 


[653-667] 


199 


[1-363] 


none detected 


[1-363] 


[474-479] 


[498-514] 


200 


[41-337] 


none detected 


[41-337] 


None detected 


[401-462] 


201 


[1-551] 


none detected 


[1-551] 


None detected 


[535-551] 


202 


[34-315] 


none detected 


[34-315] 


None detected 


[534-550] 


203 


[1-315] 


none detected 


[1-315] 


[371-376] 


[392-408] 


204 


[94-582] 


none detected 


[94-582] 


None detected 


[651-665] 


205 


[540-923] 


none detected 


[540-923] 


None detected 


[994-1008] 
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206 


[77-364] 


none detected 


[77-364] 


[367-372] 


[391-455] 


207 


[65-544] 


none detected 


[65-544] 


[710-715] 


[733-749] 


208 


[117-467] 


none detected 


[117-467] 


[557-562] 


[578-594] 


209 


[893-1897] 


none detected 


[893-1897] 


[2066-2071] 


[2082-2098] 


210 


[85-342] 


none detected 


[85-342] 


None detected 


[412-428] 


211 


[155-433] 


none detected 


[155-433] 


[713-718] 


[735-769] 


212 


[63-386] 


none detected 


[63-386] 


[878-883] 


[898-914] 


213 


[460-1290] 


none detected 


[460-1290] 


[1449-1454] ! 


[1473-1489] 


214 


[21-539] 


none detected 


[21-539] 


[741-746] 


[760-776] 


215 


[34-1143] 


none detected 


[34-1143] 


[1375-1380] 


[1397-1412] 


216 


[6-1184] 


none detected 


[6-1184] 


[1735-1740] 


[1744-1773] 


217 


[29-376] 


none detected 


[29-376] 


None detected 


[1184-1251] 


218 


[78-566] 


none detected 


[78-566] 


[858-863] 


[878-894] 


219 


[16-705] 


none detected 


[16-705] 


[868-873] 


[894-910] 


220 


[103-405] 


none detected 


[103-405] 


[482-487] 


[503-519] 


221 


[72-350] 


none detected 


[72-350] 


[593-598] 


[616-632] 


222 


[38-436] 


none detected 


[38-436] 


None detected 


[636-652] 


223 


[38-322] 


none detected 


[38-322] 


None detected 


[634-650] 


224 


[202-480] 


none detected 


[202-480] 


[472-477] 


[488-502] 


225 


[171-1670] 


none detected 


[171-1670] 


[1706-1711] 


[1725-1739] ! 


226 


[199-618] 


none detected 


[199-618] 


[626-631] 


[643-657] 


227 


[182-481] 


none detected 


[182-481] 


None detected 


[874-888] 


228 


[161-517] 


none detected 


[161-517] 


None detected 


[701-716] 


229 


[86-505] 


none detected 


[86-505] 


[618-623] 


[638-654] 


230 


[56-382] 


none detected 


[56-382] 


[598-603] 


[619-635] 


231 


[56-355] 


none detected 


[56-355] 


[597-602] 


[618-634] 


232 


[76-498] 


none detected 


[76-498] 


[546-551] 


[567-583] 


233 


[199-600] 


none detected 


[199-600] 


[705-710] 


[737-753] 


234 


[211-612] 


none detected 


[211-612] 


[717-722] 


[746-762] 


235 


[5-259] 


none detected 


[5-259] 


[502-507] 


[521-537] 


236 


[23-370] 


none detected 


[23-370] 


[956-961] 


[978-994] 


237 


[41-352J 


none detected 


r a 1 i c">i 

[41-352J 


None detected 


[o4o-Oo2 J 


238 


[3-1319] 


none detected 


[3-1319] 


[1791-1796] 


[1813-1829] 


239 


[421-768] 


none detected 


[421-768] 


[1045-1050] 


[1067-1083] 


240 


[78-590] 


none detected 


[78-590] 


None detected 


[1815-1831] 


241 


[78-608] 


none detected 


[78-608] 


None detected 


[1814-1830] 
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Table III 



List of variants 

92;1 19 ~ 
14;15 ~ 
110;111 

69;174;76 ~ 

2;12 

172; 176; 177 
150; 152; 164; 166 
154;162 _ 

77; 143 

34;62 
230;231 
63;68 
8;47 

48;49;66 ~ 

7;72 

160;161 
144; 175 

17;21 ~~ 
31;32 ~ 
5;6 ~ 

3;10 

96;121 _ 

37;41;59 

70;71 ~ 

19;24 

186;195;204 ~ 

73;74 

240;241 

221;235 
222;223 

42;45 ~ 

157;163 _ 

190;229 

117;137 

122;233;234 

201;202 

80;139 
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Table IV 





r iclcl clllldll V cAtlUUcU 1 1 itglllcllls 




192 235-2099 2201 


9 


174 995*1605 1631 




1111 1945 




1SQ0 15Q8-1607 1693 

iJ7U..lJ!70,l UvJ / . . 1 \J.£.7 




1385 1453 




1571 1639 


7 


1739 1768 


ft 
o 


14Q4 H10 


Q 

y 


570 882 


10 

1 \7 


1176 1218-1710 1749-1833 1849 


1 1 

1 1 


219 253-455 565 


12 


178 229-1636 1663 


1 3 


729. .744 


14 


790 827*1735 1759 




788 825*1731 1755 


1 6 


Q99 Q36 


1 7 

1 / 


668 747 


1 8 
I o 


1 870 1 884 


1 0 


677 691 




1194 1138 

1 1Z4.. I 1 JO 


91 


4S0 468 


99 


^0^ 41 1 -706 790 

J7J.. i +l 1, / uu.. / ZU 


9^ 


71a 797 


94 


4S6 470 


9^ 


876 Q98-Q7^ Q87 


96 


894 908 


97 
z / 


748 769 


98 


1088 1109 


90 

£y 


499 436 


30 


1879 1918-1993 1938 


31 


774..1 1 16 


32 


772.. 11 14 


33 


2056..2072 


34 


393..409 


35 


784..836 


36 


544..551;1307..1323 


37 


1867..1874;1929..1945 


38 


1315..1330 
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71 OS 9174 

z iuo..z i z*+ 


40 


41 ^ 471 -1116 1 1 SQ 


41 


1 1R70-1CH6 1Q^^ 


47 
4Z 


IOZj.. lOoo 




i o7j.. iy4z 


AA 

44 


1 £40 1A^7 
1 04U.. 103 / 


A^ 

4 J 


1£A1 Mil 


Af* 
40 


1 ccc i 071 
1 J J J., lo / 1 


Al 
4 / 


1 ^H7 1 ^71 
1 JU /.. 1 DZj 


A 0 
4o 


^/l 1 017 

D41. .ojZ 


4V 


C/lfl Oil 


jU 


VU1..V1 / 


C 1 

D I 


7 in-AfK A71 

Z.. 1U,0Uj..0Z1 


*\7 


cqc £71 


J D 


OOJ..07 / 


<A 
04 


A 1 1*7^1 1 1 fi1 
4..1 J, /Oi.. 1 1U1 


JJ 


1 Cil 1 1 047 
IUj 1 .. 1U4 / 


3D 


sii'x qo<\-qo7 on 

o / D..y\jJ,y\j 1 ..yZj 


^7 


1774 1740 
1ZZ4.. 1Z4U 


CO 
JO 


o/ci QA7 




1 o4Z . . 1 o4y , 1 yjj.. 1 yoy 


ou 


ill/: 1117 
1 1 io.. 1 1 jZ 


£1 

oi 


1 ^ A/Z-C\ Z CI 1 
1 J. .40,01 J. .03 1 


<7 
OZ 


A^1 777 
OM.. /ZZ 


oi 


1 /I 7£ 1 /Ml 

14zo.. 144z 


04 


Tin 7ftr 


o!> 


1 77H 1 71/C 

IZZU.-lzJO 


oo 


r^A OQ1 

DZU. .ooi 


of 


/111 C7/1 

41 J..JZ4 


Oo 


1 AAA 1/177 
1444.. 14 /Z 


OV 


1 77 1 1 717 
1 /Z1..1 ID 1 


7H 
/U 


1A71 1£17 
10Z1..10J / 


7 1 
/ 1 


1A7H 1£1£ 
10ZU.. lOJO 


77 

/Z 


777 79/1*17/17 17CC 
/ / / .. /o4, 1 /4Z..1 / Jo 


71 


1 CI 1 1 </l 7 
10.3 1 .. 104 / 


1A 
/4 


10JU.. 1040 


/ J 


1 Q/17 1 G/C1 

iy4/..lyoj 


76 


1741 17^7 
1 /HI ..1 / J / 


77 


1561. .1913;2011.. 2027 


78 


727..819;880..894;901..1280;1841..1880 


79 


418..584 


80 


331..353;844..1214;1337..1351 
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81 


706..720 


82 


639..713;1008..1029 


83 


1454.. 1788 


84 


712.. 805 


85 


800..814 


86 


S84..598 


87 


122..308;593..699 


88 


855. .905 


89 


500..514 


90 


81..101;198..205;504..518 


91 


650.. 808 


92 


128..201;723..737 


93 


714..728 


94 


S68..582 


95 


1761. .1773;1898.. 1913 


96 


654..670 


97 


883. .938 


98 


616..661 


99 


631.. 647 


100 


853..1006 


101 


537. .544;949.. 1059 


102 


498,.514 


103 


1142. .1158 


104 


1524.. 1563 


105 


1230..1259;1606..1621 


106 


505. .557 


107 


584..600 


108 


378. .385;1113.. 1129 


109 


729.J78 


110 


992.. 1301 


111 


991..13Q0 


112 


1131. .1139;1569.. 1617 


113 


1526..1634 


114 


457..509;677..693 


115 


768. .784 


116 


360..670;788..804 


117 


435. .484 


118 


433..452;764..985 


119 


128..201;801..839 


120 


554..564;567..583 


121 


872..908; 1008.. 1024 


122 


744..760 
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121 


S7R 594 


124 


94 102*248 559 


125 


778 744 




809 824 


1 27 


510 S26 




602 618 


12Q 


472 551-560 776 


1 10 


081 008 
yoj . *yyo 


111 
1 D 1 


1Q£ 4A8-7A1 770 
J70..4DO, IOj.. 1 ly 


1 12 

1 «7Z, 


478 517-1000 107S 


1 11 


SO? 607 


1 14 


7£1 774 
/Ol .. / /H 


1 IS 


SS£ SA1 -£01 £1 1 


116 


887 010 
oo / ..y i y 


1 17 


658 674 


1 18 


1651 177S 


1 19 


40 71-988 1158-14S8 1474 


140 


194 651 


141 


720 710-1440 1490 

/ lv.. / JU, 1447.. 


1 42 


44 110-408 S0S-S78 S8S-A4S ££1 

*T*T.. 1 1 7 J i f70..jUj,J / O..JOJ,O £ tJ..00 1 


141 


1172 1666-1771 1780 

1 jZZ.. lOOO, 1 / / J..1 /07 


144 


1878 1 807-1 01 Q 1QA8-1QQO 700A 


14S 
IHJ 


01£ QSS-10A0 100£ 


i 4£ 


778 877-1£SO 1££A 
/ /0..5Z /, lODU.. lOOO 


1 47 
1H / 


1170 1707-1A47 1 £87 
1 1 / U.. 1ZU / , 104 / .. 105 / 




1711 1747 
1 / J J.. 1 /4 / 


140 


S7Q £S8 


1 SO 


1417 1440-1778 1778-7004 704S 
14JZ.. 144U, 1 /Zo..l / /o,ZUU4..ZU4j 


1 S1 


777 788 
/ / Z.. / oo 


1 S2 


140£ 1S04-17Q7 1847-1Q1S 1Q11 
LHyo.. i jU'f, i /yz.. i o4Z, i y io.. i i 


1 SI 


S00 S14 


1 S4 


1 1 £7 1 1 81 

1 1Q/..1 1 OD 


1 SS 


1 S7Q 1 S4S 


1 S6 


701 1068 

/ .. 1 UQO 


1 S7 


871 881 - 1 08 1 1 OQ7 
o / J..OO 1 , lUo 1 .. L\Jy 1 


1 S8 

i JO 


878 804 


1 SO 


£88 701 


160 


811 840 


161 


830..846 


162 


1159..1176 


163 


869..876;1068..1084 


164 


1 444. . 1 463 ; 1 496 .. 1 504; 1 743 .. 1 793 
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1 

103 


1911 111Q-1AQ7 1G/IQ 
1Z3 J.. 13 IV, loy / .. lo4V 


100 


\AC\1 149A-14^Q 1AA7-1AQ4 1748 
I*fU / .. 14ZO, HJ7.. 140/, 10:74.. 1 /40 


1 £7 
10 / 


1 OC O 1 77C 
1230\. 12 /3 


lOo 


S/1 19Q.inno 1H91 
04.. 1ZV, 1UUZ.. IUZj 


1 £Q 
109 


Al£ A79-<\Q£ AA4-A77 AQQ.710 G^A-QQC 1 no<\ 
4 jo.. 4 /2,3yo..oU4,o /j..voy, / jz..Vj4,V7J.. 1Uo3 


1 H(\ 
1 /O 


7/C7 77/: 

/0 /.. / /o 


1 71 
1/1 


1 om 1 o 1 o 
1203.. iziy 


1 OO 
1 11 


1/111 1/1 Q7 

141 1..140/ 


1 7^ 

1 /3 


lo01..191 5 


1 7/1 

174 


1 o7/i 1 nnn 
19 /4.. 1990 


1 7C 

1/3 


i onn io/cn.iom in/in.incc imi 
1 oOO. . 1 o09, 1 o9 1 . . 1 940; 1 95 5 . . 1 9 / 1 


1 /O 


1 cm 1 c 1 1 
159 /..1013 


1 77 
ill 


1 O/C 010.1077 11/C1 

loo. .212; 12 / /..1301 


1 7C 

1 /o 


ata 070' 1 AA1 1111 

930.. 9 /o, 1002.. 1 113 


1 TO 

1 /9 


951. .1000, 1304. .033,1 944.. 1900 


loO 


1 A 07 1/1/11 

1 42 / . . 1 443 


101 

lol 


1 A7 101-07/C 711. /MO. £AC 

10 /..lol ,2 /0..31 1,449. .00:) 


1 07 
lOZ 


11/11 1/1Cn«1£77 177/1 

1 143.. 1430, io/ / .. 1 /24 


1 01 

loo 


1 OCI-A/tO A^C'11/17 1/^CA 

1 ..23 l,04o..o33,134/..100O 


1 O/l 

lo4 


/t A 7 /l £1 

44/.. 403 


1 oc 
lo5 


i cn 1 cn./coi 77i 
150..159;023.. / /3 


1 0/C 

loo 


O/in /I7/C71ft 7f -J 

340. .4/0, /39.. /53 


1 07 

lo7 


7/1 a 7C /I 

740. .754 


1 oo 

loo 


oat 11C/C/CO nno 

307..315,00o..99o 


1 on 

lo9 


1 io loc.con co/c.cm /cnc 
1 lo\.125;529..530;591..005 


1 ftA 

190 


492. .520 


i ft 1 
191 


O'TO n 1 A 

O/2..910 


192 


COC /C/CO 

525. .00o 


193 


m 10C./1/C1 /C57 

91..135;401..037 


1 n/i 
194 


ono /ico.cci /C7i./cno 7n/c 
3 92.. 45 0,55 1..07 1 ;092.. /00 


1 nc 
19!) 


050. .0/0 


1 n/c 
190 


ooo i7n./ico /i/^/c./in/c cin 
2o3..3 /9,45o..400;490..5 10 


1 m 
19/ 


1 n/c./ioo cnn 
1..90;4o3..500 


1 no 
19o 


iCOC <C/C7 

025.. 007 


1 nn 
199 


AHA C1 1 

4 /4..5 13 


onn 
200 


17A /I /CO 

3 /0..402 


on 1 
201 


COC C C 1 

535. .551 


ono 
2U2 


334. .330 


203 


3 74.. 408 


204 


651.. 665 


205 


994.. 1008 


206 


348..455 
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707 
zu / 


740 


708 


1 40-S78 S04 


700 
ZA)y 


7087 7008 


z ii/ 


417 478 


71 1 
Z 1 1 


680 760 
ooy.. /oy 


71 7 
Z 1Z 


QQQ Q14 


9 1 1 
Zio 


1766 i/iso 
izoo.. moy 


7 1 A 
Z l*t 


lfs.C\ 776 

/ou.. / /o 


9 1 ^ 

Z 1 J 


1 lOA 1111-1181 1/119 


71 A 
Z 10 


6/18 AQ1 -1711 1 771 


9 1 7 
Z 1 / 


f^AA 2^6-QIO 19^1 


9 1 8 
Z 1 o 


070 OQ/1 


9 1 Q 

z iy 


QQA 010 


990 
ZZU 


^01 ^1Q 

jUj..j iy 


991 

ZZ 1 


&\f\ £19 
O IO..OJZ 


999 
ZZZ 


616 6^9 
OJO..OJZ 


771 
ZZj 


£14 £<;o 


99A 
ZZ't 


^0 ^7-488 ^09 


99^ 
ZZ 3 


^14 ^77-179^ 171Q 


99A 

zzo 


/oil £^7 
O^f J..OJ / 


997 
ZZ / 


1 8/1-87/1 888 


ZZo 


7A1 71 A 
/U I / 10 


77Q 
ZZy 


£18 £^/l 


71ft 


7£1 <71-£1Q £1^ 
ZOJ..J / 3,0 1 V . .OJ J 


71 1 
zJ 1 


7£1 ^71£1Q £1^ 


z3z 


C/C7 ^01 


711 
Zd5 


717 7^1 


71/1 


7/1£ 7£7 
/40.. /OZ 


ZD J 


AQO ^17 


716 
Z JO 


00S 017 044 004 

y\jj ..y iz,74 i t..77 £ t 


237 


348..662 


239 


829..1083 


240 


1508..1831 


241 


1507..1830 
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Table Va 



Seq Id 

No 


Preferentially excluded fragments 


Preferential! v included fragments 


1 


[l-540];[556-615];[2061-2096];[2098-2201] 

L J'L J ' L J'L J 


[54 1 -555]; [6 1 6-2060] ; [2097-2097] 


2 


[1-51 l];[533-619];[621-690];[730-l 132] 

L J'L J/L J'L J 


[5 1 2-532];[620-620];[69 1 -729] ;[1 1 33-1 63 1 ] 

L J'L J'L J'L J 


3 


[2-539];[l 178-1245] 


[1-1];[540-1177] 


4 


[l-250];[297-383];[386-514];[1025-1064] 


[251-296];[384-385];[515-1024];[1065-1623] 

L J'L J'L J'L J 


5 


[27-1 16];[1 18-391] 


[1-26];[117-117];[392-1454] 


6 


[ 1 -93] ; [96- 1 68] ;[ 1 70-262]; [264-46 1 ] 


[94-95]; [ 1 69- 1 69] ; [263 -263]; [462- 1 63 9] 


7 


[1-95]; [97-451] 

L J " L J 


[96-96]; [452-1 768] 


8 


[1-502];[1314-1491] 


[503-1313];[1492-1510] 


9 


[1-864] 


[865-882] 


10 


[1-428] 


[429-1849] 


11 


[l-454];[482-5141 

L J?L J 


[455-481];[515-565] 


12 


[l-375];[379-511];[533-690];[730-783];[814- 
1164] 


[376-378];[512-532];[691-729];[784-813];[1165- 
1663] 


13 


[2-337];[339-556] 


[l-l];[338-338];[557-744] 


14 


[29-366];[368-507] 


[l-28];[367-367];[508-1759] 


15 


[29-366];[368-524] 


[l-28];[367-367];[525-1755] 


16 


[1-641] 


[642-936] 


17 


[1-708]; [7 11-747] 


[709-710] 


18 


[1-639] 


[640-1884] 


19 


[1-631] 


[632-691] 


20 


[3-416];[418-490] 


[1-2];[417-417];[491-1138] 

L J'L J'L J 


21 


[1-468] 


None 


22 


[1-720] 


None 


23 


[1-711] 


[712-727] 


24 


[1-469] 


[470-470] 


25 


[l-231];[234-488] 


[232-233];[489-987] 


26 


[ 1 -296] ; [3 00-642] ; [644-73 7] 


[297-299];[643-643];[738-908] 


27 


[l-306];[308-762] 


[307-307] 


28 


[1-446]; [448- 1102] 


[447-447] 


29 


[1-436] 


None 


30 


[7-334];[1420-1468];[1474-1614];[1616- 
1804];[1845-1919] 


[l-6];[335-1419];[1469-1473];[1615- 
1615];[1805-1844];[1920-1938] 


31 


[l-342];[345-519];[823-893];[977-1016] 


[343 -344] ; [520-822]; [894-976] ;[ 1 017-1116] 


32 


[1-517];[821-891];[975-1014] 


[518-820];[892-974];[1015-1114] 


33 


[36-352];[354-457];[728-832];[834- 

1096];[1253-1289];[1291-1350];[1352- 

1412];[1726-1873] 


[l-35];[353-353];[458-727];[833-833];[1097- 
1252];[1290-1290];[1351-1351];[1413- 
1725]; [1874-2072] 


34 


[1-409] 


None 
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35 


[14-105] 


[l-13];[106-836] 


36 


[1-572];[1 120-1271] 


[573-1 119];[1272-1323] 


37 


[20-98];[100-510];[1591-1681];[1683-1870] 


[1-19];[99-99];[511-1590];[1682-1682];[1871- 

i A/in 

1945] 


19 
JO 


[1-J4/J 


[j^+o-l J J\J\ 


ig 

jy 


[1-44 j J 


XAAfs 7 10/11 


a a 


n /nii.r/nc ooi 
[l-4/3J;[4/5-JZoJ 


f/17/1 /17/ll-r^9Q 1 1 ^Ql 
[4 / 4~4 / 4J , [ JZV- 1 ljyj 


A 1 

41 


[iO-jOO],[l Jo /-I oOOj 


[i-i jj,[ju / - 1 jooj, [i oo / - 1 y j j j 


42 


[2-234];[244-45 1];[974-1226] 


[l-l];[235-243];[452-973];[1227-1688] 


43 


[1-455];[1670-1925] 


[456-1669];[1926-1942] 


44 


[1-579];[815-1031] 


[580-814];[1032-1657] 


45 


[1-489];[1012-1264] 


[490-101 1];[1265-1733] 


46 


[1-400];[1184-1223];[1225-1705];[1740-1818] 


[401-1 183];[1224-1224];[1706-1739];[1819- 
1871] 


47 


[l-jzyj,[13z6-i jOj] 


r^in iio^i-n^n^ 1^711 

[j 3U-1 JZJj ? [l JUO-I jZj] 


48 


[1-1 5 1J,[1 jj-j lUJ,poU-joyj 


nio 1171-r^ii ^^Qi-r^on ri?i 

[IJZ-l JZJ,[Jl 1 -J Jy },[Jy\J-OJZ} 


/lO 

4y 


n iini'Tiio ^noi-r^o ^sci 


[1J1-1J1J,[J 1U-J JoJ,[Jo:*-oJ 1 J 


50 


[l-650];[652-868];[873-913] 


[65 1 -65 1 ] ; [869-872] ; [9 1 4-9 1 7] 


1 

51 


[1-504J;[515-605J 


[505-j 14J,[6U6-6zlJ 


52 


[1-535] 


[536-6/3J 


53 


[2-563] 


[l-l];[564-oy /] 


C /I 

54 


n crn.roAi 07Ai.roo / "> ni/ii.rn/C/c 

[l-527J;l802-870J;[882-y34J;[966- 


[jz5-o0ij,[o / l-oo i j,[y3j-yojj,[iuiy- 
1036H1081-11011 


55 


[1-326J;[328-505J 


rio7 ii7i-r^nA in/171 
[3z t oz /J,[jUo-104/J 


56 


r 1 1 A AT 

[1-34UJ 


[j4i-yzjj 


57 


r i COOT 

[1-528] 


[jzy-lz40J 


58 


n i Aoi.n 1 c icii.nc/i i/iAi.ri/i'i c.7Qi 
[l-10oj;[l 15-1 j 1J ? [1 j4-340j,[34z-jzyj 


riAQ ii/ii-ri^7 i^n-ri/ii i/in-r^in onoi 

[lUV-l 14J,[1 JZ-1 J 3J,[J41-34 1 J,[J JU-yUZJ 


59 


[4-485] ;[1 566- 1656J;[16jo-io4jJ 


n ii-r/ic^ i^^^i-riA^7 i^^i-nsMA iqaqi 
[i-3j,[4oo-i joz)J,[10j /-loj /j,[i o^fo-iyoyj 


ZTA 


[1-283] 


viqa iiioi 

[zo4-l 1 JZJ 


/C 1 

61 


[9-468J 


r 1 Cl- YA&Q £1 1 1 

[i-oj 5 [4oy-oo ij 


/CO 

6z 


T1 CXl.r/CQQ 7701 

[l-jzjJ 5 [6oy-/zzJ 


[Z>ZO-OooJ 


63 


[ i -oo J ; [y o- 1 "zj ,[ i y4-zo j j , [zyo-4uyj 


rsQ coi-nQi iQii-ri^^ 9Q^i-rziin \aai^\ 
[ov-oyj,[iyj-iyjj J [zoo-zyjj,[ £ +iu-i't^+zj 


64 


r 1 ci 7i 
[1-51 /J 


[3 1 o- fyj\ 


65 


[l-406];[408-739] 


[407-407];[740-1236] 


66 


n /iOAn.ro/iA ooii 

[1-489J;[849-881J 


[4y0-o4o] 


67 


[1-505] 


[506-524J 


68 


T1 lOCl.niO AA11.TAAA CA/11 

[ 1 -325 J ; [328-44 1 J ; [444-504] 


[3z6-3z /J ? L44z-443J,p0j-14/zJ 


69 


[1-524J;[636-715J;[717-809J;[81 1-885J;[156/- 
17151 


[DzD-OjjJ;[/10-/1oJ,[o1U-o1UJ,[ooO- 

1566];[1716-1737] 


70 


[12-487] 


[1-11];[488-1637] 


71 


[12-487] 


[1-11];[488-1636] 


72 


[1-451] 


[452-1758] 


73 


[l-167];[242-464] 


[168-241];[465-1647] 
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74 


[l-167];[242-464] 


[168-241];[465-1646] 


/ J 


[ 1 -4 I 1 J 


M77 1 G£71 

[4 /z-iyojj 


76 


[l-358];[360-543];[655-734];[736-828];[830- 
904VN 586-1 7341 


[359-359];[544-654];[735-735];[829-829];[905- 
15R51-N7^5-17571 


77 


[3-j4J,[3o-4/4J ? poz-/ /UJ,[1 /Uy-1 /4oJ,[l /4o- 
1785iri825-18991 


[l-zJ;[3j-iDj;[4/j-jolJ;L//l-l /0oj;[l 74 /- 
1 7471 • T 1 786- 1 8241 ■ \ 1 900-20271 


78 
/o 


n 7^i-r77 iiGi-roi/i in^oi-nn^i 
1126];[1 168-1203] 


L/o-/oJ,L3ZU-y 13J ? [lUji-lUozJ 5 Ll 1Z/- 
1 167];[1204-1880] 


79 


[1-425] 


[426-584] 


OU 


n 7^oi-roA7 ioi7i-no8/i i nm 

[l-/DZj,Ly4/-lUl /J,[1U54-1 1 /UJ 


r7^7 Q/t/^i'Tinic in8ii.ni7i 11^11 
[ /jj-V4oj,[1U1o-1UOjJ,[1 1 /1-1Jj1J 


o 1 


T1 /1QA1-MQ8 7701 
[ 1 -4VDJ, [4V 0- / ZUJ 


MQ7 /1Q71 

[4y /-4y / j 


5Z 


[1-JZ4J 


1 moi 
LJZj-IUZVJ 


81 
OJ 


n _/i77i« n /17/t i ^7Gi' n ^^7 i^£/;i.ri^77 

[1-4 / /J 5 [ 14/4-1 JZyj,[lD3 /-l JOOj ? [l J / /- 

1616];[1622-1662];[1717-1753] 


[4/5-14 /ij;[l jjU-1 j3oj;[1 do/- 1 j /oJ;[ 161 /- 
16211-r 1663-1 716N1754-17881 


R4 


ri-406i-r400 s6Ri-r7s? ro^i 


r4Q7-4Q81-r^6Q 7^11 


RS 


["1-5771 

[_ 1 -jz. / J 


TS9R R141 


R6 


F1 -^601 


[JO 1 O^oJ 


R7 

o / 


n -7RirR0-^R^i*r6?s-600i 


T70 701-TSR4 6741 


RR 


M-RR01 
y i ~ooy j 


TR00 00^1 


RO 


y i J i j j 


rsi4 ^141 




N -1 771 T1 74-1 SS1T1 57-41S1T417 51 71 


T171 1?11-ri^6 1S61-T416 476l-r^18 S181 
L1Zj-1ZJJ 5 [1 jO-1 JOJ,L*+jO-HjOJ ? [J lO-J loj 


y i 


ri-1 mn 6S-R0R1 


n 1A 1 641 


0? 
yz. 


ri _79S1 


T776 7171 
[ / ZO-/J / J 


01 


T1 -40Q1 


Ml n 7781 
[41U- /ZoJ 


04 
y** 


[i-jj i j 


ri17 ^871 


OS 


yi-H 1UJ 


Mi 1 1Q171 
[41 1-1V 1 jj 




n son 

[1 -jW 1 J 


rcn7 A7m 


07 


n I4n-ri4i 41 n 


T1A7 1/I71-M17 Q701 

1 1 4Z- 1 hzj , [4 jz-y j y\ 


OR 

-70 


n ion 


n qa ^^11 


00 

yy 


n 6?oi 


T610 6A71 




n -S70i-rR6?-os4i-ro76-ioosi 

y i -jz.uj, louz-27 jh-j, yy i o- 1 wu jj 


TS71 R61l-fQ^ Q7S1-T1006 10061 


101 


ri-4R0l-rSR1 0611T1010 10501 


T4Q0 S801-TQ67 10001 

l^yv-D ouj , \y oz- 1 uuy j 


10? 


N-4R51 

y 1 -to J J 


T4R6 S141 


10^ 




r^4i 1 1 ^81 

[jf 1-1 1 JOJ 


104 

1 KJ'-r 


M-5561 


TSS7 1 S611 j 

[J J /- 1 JDjJ 


1 OS 


T1 R6R1-TR70 10061 


T86Q 86Q1-T1007 16711 i 
[oO!7-oDyj,[ 1UU / -I OZ 1 J 


106 


[1-491] 


[492-557] 


107 


ri-s7^i 

y l J / j j 


TS74-6001 


108 


[l-457];[586-1110] 


[458-585];[l 1 11-1 129] 


109 


[l-521];[655-778] 


[522-654] 


110 


[ 1 -4 1 6] ; [478-6 1 4] ; [6 1 6-990] ; [992- 
1065];[1068-1283] 


[417-477];[615-615];[991-991];[1066- 
1067];[1284-1301] 
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1 1 1 

111 


n -A\£A'\A1R £1zHf"A9a Q£Ql-rQQ1 

1064]; [1067-1 282] 


l/*i f~Hf /j ? [oi j-oz /j ? [yyu-yyuj,[iUoj- 
1066];[1283-1300] 


1 19 

ill. 


f9-4991-n 161-19091-M919 13RR1-ri3Q9 1S8Q1 

\jL -'-rZ.yj j [ 1 1U 1 1ZUZJ jOOJ,[1 1 JO" J 


[1-1 J,[H.jU-1 1 OUJ,[ lZU->-lZ 1 ij,[l ~>oy- 

1391];[1590-1617] 


1 13 


N-4R71 


[ < 4O0-lOJ i +J 


1 14 


ri-701*r86-4961 


T71 RSW4Q7 6011 
[ / 1 -oJ\,[Hy /-OzOJ 


1 1J 


[ i-j joj 3 [jui/-,j joj 


[J J7-J J!7j,[J Jy- 1 o*+J 


1 16 


F 1 -2 1 SI- T2 1 8-49S1 • TS27-6071 


T216 9171-T4Q6 ^961T60£ ROzll 

[Z 1 o-z 1 / J,[H-;/0-JZOJ,[OUo-OU*+J 


1 17 


T1 -4661 


M67-ARA1 


1 18 

no 


M-S1 S1T906-9631 


l"S1 6 Q0Sl-rQ64 QRS1 ! 


1 19 

1 17 


r 1 -7441 -T746-R 161 


T74S 74S1-T817 R1Q1 


120 


ri-8svrx7-S2ii 


l~R6 R61-TS99 ^Rll 

[OU-oOJ,[ jZZ-JO j] 


121 


T1-S321 


rsil-10941 
L j jj>- i uz*+j 


122 


T1 -31 Rl-r32S-S17l-rS67-660l 


F31Q 1941-FS18 S^^l T^l 7601 


123 

1 L< J 


M-4981 


F4QQ C \QA^ 
[Hyy-JyH] 


124 


Tl-4271 


moo SSQ1 


12S 


T 1-6421 


j- /*+*+j 


126 


ri-341l-r3S0-6Q6l 


[ JHZ-JH!7j,[0" /-OZH-J 


127 


T 1-4821 


[HO J-JZOJ 


128 


ri-3i8i 

11 J J OJ 


[JJ7-U 1 OJ 


129 


T1-1 91 IN 91-4991- F4^0-6781 


N Q?-1Q?l-r4^0-449l-r^70 7761 
[LyZ.-Ly£],[HjV^Hyj 7 \0/y-/ / OJ 


1 30 


M 9-4631 T46S-S441 

[ 1 Z/^-rU Jj ? [H'UJ-J < T t Tj 


T1 1 9A'\A&A A&Al iSA^ QQ81 


131 
ui 


T 1-4701 
L i ' / \j j 


T471 7791 
[H / 1 - / / y J 


1 32 


[ i j jj 


[J>>'+- 1UZJJ 


1 33 
i j j 


M-4981 

[ 1 ^-r 2/OJ 


fAQQ 6071 


1 34 


ri-16R1-ri70 I96l-n98-471l-r^9 7181 

[i-100J ? [l /UOZOJjLjZO-H / 1J,[JJZ-/joJ 


T16Q 16Ql-r^97 ^771*M77 ^^11-r71Q 77/11 
[ iOy-10yj,[JZ /OZ /J,[4 /Z-Jj lj,[/3y-/ /4J 


135 


[l-346];[348-395];[440-473] 


[347-347];[396-439];[474-61 1] 


1 jD 


n i9/ii-ri/ii /ii£i 


[JZj-J4ZJ ? [43 /-yzjj 


1 ^7 


n is^i-nss 7^n-r)« ^171 

[1-1 oOJ,[loo-Zj 1 J,[Z0Z>-3 1 / j 


[lo /-lo /J,[zDZ-zj4J,[Mo-0/4J i 


1 12 

1 JO 


[i-4ooj 


r/t oo 1 too i 
[4o9-l /zDJ i 


1 ^0 

ijy 


n inn-nm lom-r^oo i77i-rifiQi 
[i-iui j,[iuj-iyuj,[zyz-jz /j,[iuy i- 

1161];[1228-1314] 


nno iA7T.noi 7oii.ri70 innni.ru/ci 
[lUz-lUzJ,[191-z91J,[3zo-lUyUJ;[l loz- 

1227];[1315-1474] 


140 


[l-465];[516-653] 


[466-515] 


141 


[ 1 -76 1 ] ; [763-8 5 7]; [9 1 2- 1 326] 


[762-762];[858-91 1];[1327-1490] 


142 


[1-476] 


[477-661] 


143 


[1-531];[1471-1508];[1510-1547];[1587-1661] 


[532-1470];[1509-1509];[1548-1586];[1662- 
1789] 


144 


[l-492];[503-536] 


r493-502H537-20061 


145 


[1-570] 


[571-1096] 


146 


[l-536];[621-703];[729-1075];[l 198-1445] 


[537-620];[704-728];[1076-l 197];[1446-1666] ! 


147 


[l-555];[578-628] 


[556-577];[629-1687] 


148 . 


[1-444];[1201-1474];[1480-1516] 


[445-1200];[1475-1479];[1517-1747] 
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149 


[l-613];[626-658] 


[614-625] 


150 


[4-199];[201-419];[421-492] 


[l-3];[200-200];[420-420];[493-2045] 


151 


[1-509] 


[510-788] 


152 


[l-483];[485-578] 


[484-484];[579-1931] 


153 


[1-497] 


[498-514] 


154 


[5-509];[579-763];[765-l 162] 


[l-4];[510-578];[764-764];[l 163-1 183] 


155 


[1-486];[1095-1500] 


[487-1094];[1501-1545] 


156 


[ 1 -488] ; [740-797] ; [799-884] ; [895 -974] 


[489-739];[798-798];[885-894];[975-1068] 


157 


[l-161];[163-565];[567-701] 


[ 1 62- 1 62] ; [566-566] ; [702-1 097] 


158 


[l-496];[692-754] 


[497-691];[755-894] 


159 


[1-483] 


[484-703] 


160 


[1-494] 


[495-849] 


161 


[1-491] 


[492-846] 


162 


[l-505];[575-759];[761-1164] 


[506-574] ; [760-760] ;[ 1 1 65- 1 1 76] 


163 


[1-699] 


[700-1084] 


164 


[38-483];[485-556] 


[l-37];[484-484];[557-1793] 


165 


[1-426];[1303-1444];[1717-1755];[1787-1825] 


[427-1302];[1445-1716];[1756-1786];[1826- 
1849] 


166 


[2-264];[266-446];[448-519] 


[l-l];[265-265];[447-447];[520-1748] 


167 


[l-519];[523-552] 


[520-522]; [553-1275] 


168 


[l-457];[466-571] 


[458-465];[572-1023] 


169 


[1-54]; [5 7-501] 


[55-56];[502-1085] 


170 


[1-541] 


[542-776] 


171 


[1-489] 


[490-1219] 


172 


[l-538];[977-1468] 


[539-976]; [1469-1487] 


173 


[1-631] 


[632-1915] 


174 


[2 1 -776] ; [8 8 8-967] ; [969- 1 06 1 ];[ 1 063 - 
1 137];[1 819-1967] 


[ 1 -20] ; [777-887] ; [968-968]; [ 1 062- 1 062] ; [ 1 1 3 8- 
1818];[1968-1990] 


175 


[1-508] 


[509-1971] 


176 


[l-127];[129-538];[979-1470] 


[128-128];[539-978];[1471-1613] 


177 


[l-535];[973-l 173];[1 177-1330];[1332-1361] 


[536-972];[l 174-1 176];[1331-1331] 


178 


[l-599];[626-830];[l 082-1 1 13] 


[600-625];[831-1081] 


179 


[1-623];[1377-1406] 


[624-1376];[1407-1960] 


180 


[1-4 14]; [4 18-464] 


[415-417];[465-1443] 


181 


[l-522];[533-587] 


[523-532]; [588-605] 


182 


[l-78];[99-131];[136-327];[l 153-1 184];[1210- 
1274];[1284-1319];[1385-1416] 


[79-98];[132-135];[328-l 152];[1 185- 
1209];[1275-1283];[1320-1384];[1417-1724] 


183 


[1-512];[617-805];[871-952];[1387- 
1422];[1621-1661] 


[513-616];[806-870];[953-1386];[1423- 
1620];[1662-1686] 


184 


[1-453] 


[454-463] 


185 


[1-773] 


None 


186 


[l-413];[423-604];[606-739] 


[4 1 4-422] ; [605-605] ; [740-753] 
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1 57 
1 o / 


n 1 1 7i -n 1 o Ann 

[1-1 1 / J,[l iy-4ulj 


n 1 8 11 8i-i~Afi7 7^zii 

[1 1 o- 1 1 0 J , [4UZ- / J4J 




n ^iiiT^Rd R7ni-r879 Q78i-rQ^ osn 


TS17 (\9,X\-\$n\ 871"l-rQ7Q Q3dl-rQ87 QQ81 
\J 1Z-OojJ,[0 / 1-0 / 1 ])iyZy-yJH\ 7 [yoz,-yyO} 


1 80 


[ 1 -QUO J 


None 


1 on 


[Z-4 / D\ 


T1 1 1-rA7£ ^7A1 
[1-1J ? [4/0-jZ0J 


1 01 
iy I 


n Qi m 
[i-y 1UJ 


None 


1 07 

iyz 


n 1 m n - r i ^aoi 

[1-1U1J,[1 U J -005 J 


nno 1 n7i 

[1UZ-1UZJ 




n ^oni-r^Qi £771 
[1-jZUJ,[joj-0j /j 


1*^71 ^571 
[yZlOoZJ 




[1-/U0J 


None 1 


1 o^ 
iyo 


[1-14jJ,[1 jU-4M J,L4ou-o /UJ 


[14o-14yj 5 [4jZ-40jJ 


lyo 


[1-jUVJ 


rci a rim 
[J 1UO 1UJ 


1 Q7 


[1-jUUJ 


None 


1 oc 
iyo 


[1-jUjJ,[jUj-3ojJ 


[jU4-jU4J 3 [joo-oo /J 


1 oo 

lyy 




|4yy-M4J 


7nn 
ZUU 


[1-40ZJ 


None 


7m 

ZU1 


[1-j3 1 J 


None 


7H7 

ZAjL 


n /is7i-r/i8/i 

[ 1 -4oZJ ? [4o4-jOUJ 


[4oj-4ojJ 




n ziooi 

[1-4U5J 


None 


ZU4 


n ^ioi-r<\7i A/ion 
[i-j iyj,[jzi-04yj 


[ jZU- jZUJ , [OjU-OOjJ 


7fK 
ZUj 


n 7Aii-r7Ai /n^i-r/117 &ac\~\-x&ai 7571 

[i-ZOl J,[Z0J-4OJ,|41 /-O4UJ,L04Z- /oZJ 


T7A7 7^71- TA1 £ 41 A1-TAA1 A4n.r781 1 AH51 
[Z0Z-Z0Z J ? [4 1 0-4 1 0J , [04 1 -04 1 J, [ / o J- 1 \)K)o\ 


7n£ 
zuo 


[ 1 -4 J j 1 


None 


7H7 

zu / 


n AC\T\-X A\(\ ^7^1 
[ 1 -4UZ J , [4 1 U- jZdJ 


[4U3-4Uyj , [ jZ /- /4yj 


on© 
ZUo 


[1-jZUJ 


[jZloy4J 


zuy 


[1-ly /J,[ZUU-4/Zj 


[iyo-iyyj,[4 /i-zuyoj 


7 1 o 
ZlU 


[1-3 1 1J,L3 14-42 /J 


[ J 1 Z-3 1 ij , [4Z0-4ZQJ 


O 1 1 

Zl I 


[l-ooyJ,L/3j-/oyj 


[oyO-/34J 


O 1 o 

Z1Z 


r 1 cm 
[1-M /J 


1 o n 1 /ii 
plo-914J 


213 


[2-576];[756-795];[1390-1441] 


[l-l];[577-755];[796-1389];[1442-1489] 


214 


[1-482] 


[483-776] 


215 


[1-498] 


[499-1412] 


216 


[1-505];[1000-1293];[1295-1408];[1744-1773] 


[5 06-999] ;[ 1 294- 1 294] ;[ 1 409- 1 743 ] 


217 


[ l - 1 02] ; [ 1 04-29 1 ] ; [293-467] ; [486-708] ; [723- 
1090];[1097-1153] 


[103-103];[292-292];[468-485];[709-722];[832- 

8^?1T001 Q001T10^7 10S^l-n001 100^iri1S4 
o DA^yyyj i -y \jy j ,[ iu3Z.-ivjjjj,[iv/!/i - iuj/uj,l i i jh- 

1251] 


7 1 2 
Z 1 o 


[1-4DZJ 


T/ICT 5Q/I1 

L4!)3-oy4J 


710 

z iy 


[ I -0 j4J,[ j jO-jVoJ 


[DOj-Dj j\,[jyy-y 1UJ 


77A 

zzu 


n 1Q1-XA1 Q^1«rQ5 -ISAI-riSS /1571 


nn Af\l'TQ& Q'71.r'35'7 ^Q^l^TAQQ < 1 Ql 

[3y-4uj,[yo-y /j,L3o /-io /j 5 [4ooo iyj 


ZZi 


[l-34J,L3o-ZZUJ,LzZz-J-5!>J,[33 /-MoJ 


[Jj-j /J 5 [Zzl-ZZlJ 5 p3o-3ioJ ? p ly-oJZJ 


zzz, 


[ 1 -HOOJ 


[H-Oy-OJZJ 


223 


[1-466] 


[467-650] 


224 


[1-466] 


[467-502] 


225 


[l-489];[653-1008] 


[490-652];[1009-1739] 


226 


[1-657] 


None 
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227 


[1-480] 


[481-888] 


228 


[1-501] 


[502-716] 


229 


[1-612] 


[613-654] 


230 


[l-477];[485-538] 


[478-484];[539-635] 


231 


[l-476];[484-537] 


[477-483];[538-634] 


232 


[1-367];[371-512] 


[368-370];[513-583] j 


233 


[l-305];[307-442];[460-503];[553-646] 


[306-306];[443-459];[504-552];[647-753] 


234 


[l-260];[262-345];[347-454];[473-515];[565- 
658] 


[261-261];[346-346];[455-472];[516-564];[659- 
762] 


235 


[1-427] 


[428-537] 


236 


[1-465] 


[466-994] 


237 


[ 1 -47 1 J ; [496-5 26] ; [5 5 / -5 o7 J , [5 y / -63 7 J 


[4 / z-4y D J , [p 1 1 -5 5 6 J , [5 oo - 5 yo J , [o 5 o-ooz J 


238 


[l-338];[352-497] 


[339-35 1];[498-1829] 


239 


[1-501] 


[502-1083] 


240 


[1-515];[1527-1583];[1585-1687];[1692-1831] 


[516-1526];[1584-1584];[1688-1691] 


241 


[1-515];[1526-1582];[1584-1686];[1691-1830] 


[516-1525];[1583-1583];[1687-1690] 
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Table Vb 



Seq Id No 


Preferentially excluded fragments 


Preferentially included fragments 


1 


[ 1 -540]; [556-6 1 5] ;[2061 -2096];[2098-220 1] 


[541-555];[616-2060];[2097-2097] 


2 


[1-51 l];[533-619];[621-690];[730-l 132] 


[512-532];[620-620];[691-729];[1133-1631] 


3 


[2-539];[l 178-1245] 


[1-1];[540-1177] 


4 


[l-250];[297-383];[386-514];[1025-1064] 


[25 1 -296]; [3 84-3 85]; [5 1 5- 1 024]; [ 1 065-1 623] 


5 


[27-1 16];[1 18-391] 


[1-26];[117-117];[392-1454] 


6 


[l-93];[96-168];[170-262];[264-461] 


[94-95];[169-169];[263-263];[462-1639] 


7 


[l-95];[97-451] 


[96-96];[452-1768] 


8 


[1-502];[1314-1491] 


[503-1313];[1492-1510] 


9 


[1-864] 


[865-882] 


10 


[1-428] 


[429-1849] 


11 


[1-454]; [482-5 14] 


[455-481];[515-565] 


12 


[l-375];[379-511];[533-690];[730-783];[814- 
1164] 


[376-378];[512-532];[691-729];[784-813];[1165- 
1663] 


13 


[2-337];[339-556] 


[l-l];[338-338];[557-744] 


14 


[29-366];[368-507] 


[l-28];[367-367];[508-1759] 


15 


[29-366];[368-524] 


[l-28];[367-367];[525-1755] 


16 


[1-641] 


[642-936] 


17 


[1-708];[71 1-747] 


[709-710] 


18 


[1-639] 


[640-1884] 


19 


[1-631] 


[632-691] 


20 


[3-416];[418-490] 


[1-2];[417-417];[491-1138] 


21 


[1-468] 


None 


22 


[1-720] 


None 


23 


[1-711] 


[712-727] 


24 


[1-469] 


[470-470] | 


25 


[l-231];[234-488] 


[232-233];[489-987] 


26 


[l-296];[300-642];[644-737] 


[297-299];[643-643];[738-908] 


27 


[l-306];[308-762] 


[307-307] 


28 


[l-446];[448-1102] 


[447-447] 


29 


[1-436] 


None 


30 


[7-334];[1420-1468];[1474-1614];[1616- 
1804];[1845-1919] 


[l-6];[335-1419];[1469-1473];[1615-1615];[1805- 
1844];[1920-1938] 


31 


[l-342];[345-519];[823-893];[977-1016] 


[343-344];[520-822];[894-976];[1017-l 116] 


32 


[1-517];[821-891];[975-1014] 


[5 1 8-820]; [892-974]; [ 10 1 5-1114] 


33 


[36-352];[354-457];[728-832];[834- 

1096];[1253-1289];[1291-1350];[1352- 

1412];[1726-1873] 


[l-35];[353-353];[458-727];[833-833];[1097- 
1252];[1290-1290];[1351-1351];[1413- 
1725];[1 874-2072] 


34 


[1-409] 


None 


35 


[14-105] 


[l-13];[106-836] 
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36 


[1-572];[1 120-1271] 


[573-1 119];[1272-1323] 


37 


[20-98];[100-510];[1591-1681];[1683-1870] 


[1-19]; [99-99] ; [5 1 1 - 1 590]; [ 1 682- 1 682] ; [ 1 87 1 - 
1945] 


38 


[1-547] 


[548-1330] 


39 


[1-445] 


[446-2124] 


40 


[l-473];[475-528] 


[474-474];[529-l 159] 


41 


[16-506];[1587-1866] 


[1-15];[507-1586];[1 867-1953] 


42 


[2-234];[244-451];[974-1226] 


[l-l];[235-243];[452-973];[1227-1688] 


43 


[l-455];[ 1670-1 925] 


[456-1669];[1926-1942] 


44 


[1-579];[815-1031] 


[580-814];[1032-1657] 


45 


[1-489];[1012-1264] 


[490-101 1];[1265-1733] 


46 


[1-400];[1184-1223];[1225-1705];[1740- 
1818] 


[401-1 183];[1224-1224];[1706-1739];[1819- 
1871] 


47 


[1-529];[1326-1505] 


[530-1325];[1506-1523] 


48 


[l-131];[133-510];[560-589] 


[132-132];[511-559];[590-832] 


49 


[l-130];[132-509];[559-588] 


[131-131];[510-558];[589-831] 


50 


[l-650];[652-868];[873-913] 


[651-651];[869-872];[914-917] 


51 


[l-504];[515-605] 


[505-5 14];[606-621] 


52 


[1-535] 


[536-673] 


53 


[2-563] 


[l-l];[564-897] 


54 


[ 1 -527] ; [802-870] ; [882-934] ; [966- 
101©J;[1UJ /-lUoUJ 


[528-801];[871-881];[935-965];[1019- 

1 CT\ . T1 AO 1 1 1 AH 

103oJ;[l(Jol-l 101] 


55 


[l-326];[328-505] 


[327-327];[506-1047] 


56 


[1-340] 


[341-925] 


57 


[1-528] 


[529-1240] 


58 


[l-108];[115-151];[154-340];[342-529] 


[109-1 14];[152-153];[341-341];[530-902] 


59 


[4-485];[1566-1656];[1658-1845] 


[l-3];[486-1565];[1657-1657];[1846-1969] 


60 


[1-283] 


[284-1132] 


61 


[9-468] 


[l-8];[469-631] 


62 


[l-525];[689-722] 


[526-688] 


63 


[l-88];[90-192];[194-265];[296-409] 


[89-89];[193-193];[266-295];[410-1442] 


64 


[1-517] 


[518-795] 


65 


[l-406];[408-739] 


[407-407];[740-1236] 


66 


[l-489];[849-881] 


[490-848] 


67 


[1-505] 


[506-524] 


68 


[l-325];[328-441];[444-504] 


[326-327];[442-443];[505-1472] 


69 


[ 1 -524] ; [636-7 1 5] ; [7 1 7-809] ; [8 1 1 - 
o85J;[15d7-1715J 


[525-635];[716-716];[810-810];[886-1566];[1716- 
1 15 1\ 


70 


[12-487] 


[1-1 1];[488-1637] 


71 


[12-487] 


[1-11];[488-1636] 


72 


[1-451] 


[452-1758] 


73 


[l-167];[242-464] 


[168-241];[465-1647] 


74 


[1-1 67]; [242-464] 


[168-241];[465-1646] 
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75 


[1-471] 


[472-1963] 


76 


[l-358];[360-543];[655-734];[736-828];[830- 
904];[1586-1734] 


[359-359];[544-654];[735-735];[829-829];[905- 
1585];[1735-1757] 


77 


[3-34];[36-474];[582-770];[1709- 
1746];[1748-1785];[1825-1899] ; 


[l-2];[35-35];[475-581];[771-1708];[1747- 
1 747]; [ 1 786- 1 824]; [ 1 900-2027] 


78 


[l-75];[77-319];[914-1052];[1063- 
H26];[ll68-l203] 


[76-76];[320-913];[1053-1062];[1127- 
1167];[1204-1880] 


79 


[l -425] 


[426-584] 


80 


[l-752];[947-l0l7];[l084-l 170] 


[753-946];[1018-1083];[l 171-1351] 


81 


[l -496]; [498-720] 


[497-497] 


82 


[1-324] 


[325-1029] 


83 


[1-477];[1474-I529];[l537-I566];[l577- 
1616];[1622-1662];[1717-1753] 


[478-1473];[1530-1536];[1567-1576];[1617- 
1621];[1663-1716];[1754-1788] 


84 


[l-496];[499-568];[752-805] 


[497-498];[569-751] 


85 


[1-527] 


[528-814] 


86 


[1-360] 


[361-598] 


87 


[l-78];[80-583];[625-699] 


[79-79];[584-624] 


88 


[1-889] 


[890-905] 


89 


[1-513] 


[514-514] 


90 


[l-122];[124-155];[157-435];[437-517] 


[123-123];[156-156];[436-436];[518-518] 


91 


[l-133];[165-808] 


[134-164] 


92 


[1-725] 


[726-737] 


93 


[1-409] 


[410-728] 


94 


[1-331] 


[332-582] 


95 


[1-410] 


[411-1913] 


96 


[1-501] 


[502-670] 


97 


[1-141];[143-431] 


[142-142];[432-939] 


98 


[1-193] 


[194-661] 


99 


[1-629] 


[630-647] 


100 


[l-520];[862-954];[976-1005] 


[521-861];[955-975];[1006-1006] 


101 


[1-489];[581-961];[1010-1059] 


[490-5 80]; [962- 1009] 


102 


[1-485] 


[486-514] 


103 


[1-540] 


[541-1158] 


104 


[1-556] 


[557-1563] 


105 


[l-868];[870-1006] 


[869-869];[1007-1621] 


106 


[1-491] 


[492-557] 


107 


[1-573] 


[574-600] 


108 


[l-457];[586-1110] 


[458-585];[llll-1129] 


109 


[l-521];[655-778] 


[522-654] 


110 


[ 1 -4 1 6] ; [478-6 1 4] ; [6 1 6-990] ; [992- 
1065];[1068-1283] 


[417-477];[615-615];[991-991];[1066- 
1067];[1284-1301] 
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111 


[1-416];[478-614];[628-989];[991- 
1064];[1067-1282] 


[417-477] ; [615-627];[990-990];[1065- 

i A/n ri ioi 1 1 AAi 

1066j;[12o3-1300| 


112 


[2-429];[1161-1202];[1212-1388];[1392- 
1589] 


[1-1];[430-1160];[1203-1211];[1389-1391];[1590- 
1617J 


113 


[1-487] 


[488-1634] 


114 


[1-70]; [86-496] 


[71-85];[497-693] 


115 


[l-358];[360-558] 


[359-359];[559-784] 


116 


[ 1 -2 1 5] ; [2 1 8-495]; [527-607] 


[2 1 6-2 1 7]; [496-526]; [608-804] 


117 


[1-466] 


[467-484] 


118 


[1-5 15]; [906-963] 


[516-905];[964-985] 


119 


[l-744];[746-816] 


[745-745]; [8 17-839] 


120 


[l-85];[87-521] 


[86-86];[522-583] 


121 


[1-532] 


[533-1024] 


122 


[ 1 -3 1 8] ; [3 25-5 1 7] ; [5 67-660J 


[319-324];[518-566];[661-760] 


123 


[1-498] 


[499-594] 


124 


[1-427] 


[428-559] 


125 


[1-642] 


[643-744] 


126 


[l-341];[350-696] 


[342-349];[697-824] 


127 


[1-482] 


[483-526] 


128 


[1-338] 


[339-618] 


129 


[l-191];[193-429];[450-678] 


[ 1 92- 1 92] ; [430-449] ; [679-776] 


130 


[19-463];[465-544] 


[1-18]; [464-464]; [545 -998] 


131 


[1-470] 


[471-779] 


132 


[1-533] 


[534-1025] 


133 


[1-498] 


[499-607] 


134 


[l-168];[170-326];[328-471];[552-738] 


[ 1 69- 1 69];[327-327];[472-55 1 ];[73 9-774] 


135 


[l-346];[348-395];[440-473] 


[347-347];[396-439];[474-61 1] 


136 


[1-324]; [343-43 6] 


[325-342];[437-925] 


137 


[1-186];[188-251];[255-517] 


[187-1 87]; [252-254];[5 1 8-674] 


138 


[1-488] 


[489-1725] 


139 


[1-101];[103-190];[292-327];[1091- 
1161];[1228-1314] 


[102-102];[191-291];[328-1090];[1 162- 
1227];[1315-1474] 


140 


[l-465];[516-653] 


[466-515] 


141 


[ 1 -76 1 ] ;[763-85 7] ; [9 1 2- 1 326] 


[762-762];[858-91 1];[1327-1490] 


142 


[1-476] 


[477-661] 


143 


[1-531];[1471-1508];[1510-1-547];[1587- 
1661] 


[532-1470];[1509-1509];[1548-1586];[1662- 
1789] 


144 


[l-492];[503-536] 


[493-502];[537-2006] 


145 


[1-570] 


[571-1096] 


146 


[l-536];[621-703];[729-1075];[1198-1445] 


[5 37-620] ; [704-728] ; [ 1 076- 1 1 97] ; [ 1 446-1 666] 


147 


[l-555];[578-628] 


[556-577];[629-1687] 


148 


[1-444];[1201-1474];[1480-1516] 


[445 - 1 200] ;[ 1 475 - 1 479] ;[ 1 5 1 7- 1 747] 
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149 


[l-613];[626-658] 


[614-625] 


150 


[4- 1 99] ; [20 1 -4 1 9] ; [42 1 -492] 


[ 1 -3] ; [200-200] ; [420-420] ; [493 -2045] 


151 


[1-509] 


[510-788] 


152 


[l-483];[485-578] 


[484-484]; [5 79- 1931] 


153 


[1-497] 


[498-514] 


154 


[5-509];[579-763];[765-l 162] 


[l-4];[510-578];[764-764];[l 163-1 183] 


155 


[1-486];[1095-1500] 


[487-1094];[1501-1545] 


156 


[ 1 -488] ; [740-797] ; [799-884] ;[895-974] 


[489-739];[798-798];[885-894];[975-1068] 


157 


[l-161];[163-565];[567-701] 


[162-162];[566-566];[702-1097] j 


158 


[l-496];[692-754] 


[497-691]; [755-894] 


159 


[1-483] 


[484-703] 


160 


[1-494] 


[495-849] 


161 


[1-491] 


[492-846] 


162 


[l-505];[575-759];[761-1164] 


[506-574]; [760-760];[l 165-1 176] 


163 


[1-699] 


[700-1084] 


164 


[38-483];[485-556] 


[l-37];[484-484];[557-1793] 


165 


[1-426];[1303-1444];[1717-1755];[1787- 
1825] 


[427-1302];[1445-1716];[1756-1786];[1826- 
1849] 


166 


[2-264];[266-446];[448-519] 


[l-l];[265-265]; [447-447] ; [520- 1 748] 


167 


[l-519];[523-552] 


[520-522];[553-1275] 


168 


[l-457];[466-571] 


[458-465];[572-1023] 


169 


[l-54];[57-501] 


[55-56];[502-1085] 


170 


[1-541] 


[542-776] 


171 


[1-489] 


[490-1219] 


172 


[l-538];[977-1468] 


[539-976];[1469-1487] 


173 


[1-631] 


[632-1915] 


174 


[21-776];[888-967];[969-1061];[1063- 
1137];[1819-1967] 


[l-20];[777-887];[968-968];[1062-1062];[1138- 
18 18]; [1968- 1990] 


175 


[1-508] 


[509-1971] 


176 


[l-127];[129-538];[979-1470] 


[128-128];[539-978];[1471-1613] 


177 


[l-535];[973-l 173];[1 177-1330];[1332-1361] 


[536-972];[l 174-1 176];[1331-1331] 


178 


[l-599];[626-830];[1082-l 113] 


[600-625];[831-1081] 


179 


[1-623];[1377-1406] 


[624-1376];[1407-1960] 


180 


[1-4 14]; [4 18-464] 


[415-417];[465-1443] 


181 


[l-522];[533-587] 


[523-532];[588-605] 


182 


[l-78];[99-131];[136-327];[1153- 
1184];[1210-1274];[1284-1319];[1385-1416] 


[79-98];[132-135];[328-l 152];[1 185-1209];[1275- 
1283];[1320-1384];[1417-1724] 


183 


[1-512];[617-805];[871-952];[1387- 
1422];[1621-1661] 


[513-616];[806-870];[953-1386];[1423- 
1620];[1662-1686] 


184 


[1-453] 


[454-463] 


185 


[1-773] 


None 


186 


[1-413]; [423-604] ; [606-73 9] 


[414-422];[605-605];[740-753] 
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187 


[1-1 17];[1 19-401] 


[118-118];[402-754] 


188 


[l-511];[684-870];[872-928];[935-981] 


[512-683];[871-871];[929-934];[982-998] 


189 


[1-605] 


None 


190 


[2-475] 


[l-l];[476-526] 


191 


[1-910] 


None 


192 


[l-101];[103-668] 


[102-102] 


193 


[l-520];[583-637] 


[521-582] 


194 


[1-706] 


None 


195 


[l-145];[150-451];[466-670] 


[146-149];[452-465] 


196 


[1-509] 


[510-510] 


197 


[1-500] 


None 


198 


[l-503];[505-585] 


[504-504]; [586-667] 


199 


[1-498] 


[499-514] 


200 


[1-462] 


None 


201 


[1-551] 


None 


202 


[l-482];[484-550] 


[483-483] 


203 


[1-408] 


None 


204 


[l-519];[521-649] 


[520-520]; [650-665] 


205 


[l-261];[263-415];[417-640];[642-782] 


[262-262];[416-416];[641-641];[783-1008] 


206 


[1-455] 


None 


207 


[l-402];[410-526] 


[403-409]; [527-749] 


208 


[1-520] 


[521-594] 


209 


[l-197];[200-472] 


[198- 199]; [473-2098] 


210 


[1-31 1];[3 14-427] 


[312-313];[428-428] 


211 


[l-689];[735-769] 


[690-734] 


212 


[1-517] 


[518-914] 


213 


[2-576];[756-795];[1390-1441] 


[l-l];[577-755];[796-1389];[1442-1489] 


214 


[1-482] 


[483-776] 


215 


[1-498] 


[499-1412] 


216 


[1-505];[1000-1293];[1295-1408];[1744- 
1773] 


[506-999];[1294-1294];[1409-1743] 


217 


[l-102];[104-291];[293-467];[486-708];[723- 
831];[833-900];[910-1031];[1054- 
1U9UJ,[1U9/-1 153] 


[103-103];[292-292];[468-485];[709-722];[832- 

832];[901-909];[1032-1053];[1091-1096];[1154- 

1Z51J 


218 


[1-452] 


[453-894] 


219 


[l-554];[556-598] 


[555-555];[599-910] 


220 


[l-38];[41-95];[98-386];[388-487] 


[39-40];[96-97];[387-387];[488-519] 


221 


[l-34];[38-220];[222-335];[337-518] 


[3 5-3 7] ; [22 1 -22 1 ]; [3 36-3 3 6] ; [5 1 9-63 2] 


222 


[1-468] 


[469-652] 


223 


[1-466] 


[467-650] 


224 


[1-466] 


[467-502] 


225 


[l-489];[653-1008] 


[490-652];[1009-1739] 



529 



G-078US05DIV 



226 


[1-657] 


None 


227 


[1-480] 


[481-888] 


228 


[1-501] 


[502-716] 


229 


[1-612] 


[613-654] 


230 


[l-477];[485-538] 


[478-484];[539-635] 


231 


[l-476];[484-537] 


[477-483];[538-634] 


232 


[1-367];[371-512] 


[368-370];[5 13-583] 


233 


[l-305];[307-442];[460-503];[553-646] 


[306-306] ;[443-459]; [504-5 52] ; [647-75 3] 


234 


[l-260];[262-345];[347-454];[473-515];[565- 
658] 


[261-261];[346-346];[455-472];[516-564];[659- 
162] 


235 


[1-427] 


[428-537] 


236 


[1-465] 


[466-994] 


237 


[l-471];[496-526];[557-587];[597-637] 


[472-495];[527-556];[588-596];[638-662] 


238 


[l-338];[352-497] 


[339-351];[498-1829] 


239 


[1-501] 


[502-1083] 


240 


[1-515];[1527-1583];[1585-1687];[1692- 
1831] 


[516-1526];[1584-1584];[1688-1691] 


241 


[1-515];[1526-1582];[1584-1686];[1691- 
1830] 


[516-1525];[1583-1583];[1687-1690] 
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Table VI 



Seq Id No 


Designation of domain 


Database 


Positions of 
domains 


242 


Cell attachment sequence 


PROSITE 


141-143 


242 


Peptidase family M20/M25/M40 


PFAM 


107-451 


244 


Mitochondrial enerjrv transfer Droteins 

I'll lV/VilvllV*l 1U1 VllvlgJ' U UHJ1V1 L/l VVVlllw 

signature 


PROSITE 


26-35 


244 


Mitochondrial energy transfer proteins 
signature 


PROSITE 


199-208 


244 


Mitochondrial carrier proteins 


PFAM 


5-84;87- 
175; 178-272 


244 


Mitochondria] energy transfer proteins. 


BLOCKSPLUS 


12-36 


244 


Mitochondrial energy transfer proteins. 


BLOCKSPLUS 


13-36 


244 


Mitochondrial energy transfer proteins. 


BLOCKSPLUS 


131-144 


245 


Leucine zipper pattern 


PROSITE 


371-392 


249 


Leucine zipper pattern 


PROSITE 


20-41 


251 


Mitochondrial energy transfer proteins 
signature 


PROSITE 


26-35 


9^ 1 


Mitochondrial earner proteins 


DT7 A AA 


J-/Z 


251 


Mitochondrial energy transfer proteins. 


BLOCKSPLUS 


12-36 


251 


Mitochondrial energy transfer proteins. 


BLOCKbrLUo 


13-36 


254 


Pancreatic ribonuclease family signature 


PROSITE 


63-69 


254 


Pancreatic ribonucleases 


PFAM 


26-143 


254 


PANCREATIC RIBONUCLEASE FAMILY 
SIGNATURE 


BLOCKSPLUS 


49-69 


254 


Pancreatic ribonuclease family proteins. 


BLOCKSPLUS 


115-140 


254 


PANCREATIC RIBONUCLEASE FAMILY 

C TOM A TT TQ V 


BLOCKSPLUS 


92-110 


254 


Tk A \T/^T»T A TT/~1 TT»/~v\TT T/*<T T"* A CIT^ T~" A A >TTT "X 7 

PANCREATIC RIBONUCLEASE FAMILY 


T*»T /^\/^T/* OT4T T TO 

BLOCKSPLUS 


114-133 


254 


Pancreatic ribonuclease family proteins. 


OT APVODT T TC 

BLOCKbrLUb 


3U-4U 


254 


PANCREATIC RIBONUCLEASE FAMILY 

O JLVJiN/v I U JVC/ 


BLOCKSPLUS 


114-137 


254 


rANCRLAllC KJLBONUCLbAbb rAMILY 
SIGNATURE 




09-00 


255 


L-lactate dehydrogenase active site 


PROSITE 


239-245 


255 


lactate/malate dehydrogenase 


PFAM 


71-380 


255 


L-lactate dehydrogenase proteins. 


BLOCKSPLUS 


186-224 


255 


L-LACTATE DEHYDROGENASE 
SIGNATURE 


BLOCKSPLUS 


96-121 


255 


L-lactate dehydrogenase proteins. 


BLOCKSPLUS 


71-102 


255 


L-lactate dehydrogenase proteins. 


BLOCKSPLUS 


238-256 
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JL-.LAL, 1 A 1 £ JUr,ri Y iJKUUfciNAor!, 
SIGNATURE 


DT r\r^ v CUT TTC 


1 

loJ-zU3 


7<^ 
ZOj 


L-lactate dehydrogenase proteins. 


"DT C\f^V CDT T TC 




7<^ 


t t a c^ r r a t't? r\EiivnDrirr\T A cc 
L-LAtlAiE Dxiri YJJKUljbJNAob 

SIGNATURE 


r>T r\r^\^ cot ttc 


zU /-ZZ4 


255 


L-LACTATE DEHYDROGENASE 
SIGNATURE 


BLOCKSPLUS 


71-92 


7^ 
ZOO 


L-lactate dehydrogenase proteins. 


d t APi/cpT TTC 


HQ 1 £7 
1 JO-lO / 


256 


lactate/malate dehydrogenase 


PFAM 


71-124 


ZOO 


T T A /^T A TTT TACUVFxD A/^CAT A O C 

SIGN ATI FRE 


BLOCKbr L U o 


yo-lzl 


256 


L-lactate dehydrogenase proteins. 


BLOCKSPLUS 


71-102 


256 


L-LAC 1 A 1 b DbH YDROGbN ASb 
SIGNATURE 


T>T APT/Pm T TO 

BLOCKSPLUS 


71-92 


7^6 

ZOO 


.l- lactate aenyarogenase proxems. 




/ 1-lUU 


7^6 

Z30 


T T APTATP nFUVHD A CT7 
JU-JLAl^ IAId DrLrl i JJKvJOxiiN Aoxi 

SIGNATURE 


T3T HPF CPT TTC 


71 

/ 1-54 


Z, J / 


T ^lioiTif* vinnPT mf^i^f^m 
JL>CUL/lilC ZippCI pallCIIl 


PPO^TTF 
r fvv^kjl i d 


1 J J- 1 / o 


z jy 


IT \J XvlVl r\ UOlIld.111 


JrJr/VLvi 


zz-z JU 


761 
ZOl 


T Oll/^inO r7frtT\OT t-VO'f^"iiT*M 

jueucine zipper partem 


PPOCTTTT 


1*+Z- 1 Oj 


ZO 1 


T piinmp r 7ii - \r>p»T' Y\ctt~tf>rrt 

Leucine zipper pduern 


PPOCTTF 
x xvvJol 1 t-j 


1 70 1 Ql 

i /u- 1 y i 


967 

ZOO 


i^cuL/iiic zipper pdllClTl 




1 J-jO 


764 
ZO't 


I [Kl /ll 11 fin T01Y11 Il7 

u Dicjuiiin iamny 


PT7 a aa 
x r A1V1 


1 R7 
1 -oZ 


76/1 
Z04 


Ubiquitin domain proteins. 


dt nrv' qpi ttc 


1 7 67 
1 /-OZ 


76A 


UDiquiun aomam proteins. 


"R.T OPT^CPT TTC 


71 6ft 
Z l-Oo 


764. 


Ubiquitin domain proteins. 


PIT nri^QPT TTC 


76 6£ 
ZO-Oo 


9 A J. 
Z04 


UDicjuitm aomain proteins. 


dt nri^QPT ttc 


1 7 6ft 
1 /-oo 


766 
ZOO 


u-rAivLy-o aomain 


PPT AM 
r r AiVl 


6H 1 1 Q 

ou-i iy 


766 
zoo 


oquasn iamny oi serine protease mnioit 


PT7 AM 




767 
ZO / 


z>inc linger, v^zriz type, aomam proteins. 


RT OrFQPT TTC 


1 ft^ 7H7 
1 o^-ZUZ 


971 
Z / 1 


Lor / oil / L^xiii iamny signature 


PPfiCTTF 


7ft 60 


971 
z / 1 


x yroKinins signature 


rxvUollll 


1")A 77ft 


971 
z / 1 


T RP / RPT / PFTP fnmi K/ 


x JT AiVl 


1 0./17Q 


971 
z / I 


T RP / RPT / PFTP fsmiK/ rvrrktf»inc 

ldt / Dri / v^n l x iamny proteins. 


r>T fiPl^CPT TTC 

D JL \J xv o r U o 


77 1 1 ft 
/Z- 1 1 o 


271 


LBP / BPI / CETP family proteins. 


BLOCKSPLUS 


209-253 


971 
Z / 1 


T "DP / "DPT / rTTD *fV**-*-»i1xr f-VT-z^+ia-i-ric 

Lor / Dri / L.b i r iamny proteins. 


dt nri/CDT ttc 


70 ^ft 


971 
Z / 1 


Lor / Dri / Le i r iamiiy proteins. 


DT r\Clf Q DT TTC 


77^ 1AQ 


271 


LBP / BPI / CETP family proteins. 


BLOCKSPLUS 


76-113 


272 


Zinc finger, C3HC4 type (RING finger), 
signature 


PROSITE 


102-111 


272 


Zinc finger, C3HC4 type (RING finger) 


PFAM 


87-129 


272 


Zinc finger, C3HC4 type (RING finger), 
proteins. 


BLOCKSPLUS 


102-111 
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273 n 


Zinc finger, C3HC4 type (RING finger), 
signature 


PROSITE 


30-39 


273 


Zinc finger, C3HC4 type (RING finger) 


PFAM 


15-57 


273 


Zinc finger, C3HC4 type (RING finger), 
proteins. 


BLOCKSPLUS 


30-39 


1*7/1 
2 /4 


RNA 3'-terminal phosphate cyclase signature 


T)D f~\Q TTT? 


K7 1 en 

Id /-lo/ 


^ "7 A 

274 


RNA 3 '-terminal phosphate cyclase 


PFAM 


1 T /TO 

1-365 


274 


RNA 3'-terminal phosphate cyclase proteins. 


BLOCKSPLUS 


1 'I A A 

12-44 


274 


RNA 3'-terminal phosphate cyclase proteins. 


BLOCKSPLUS 


157-168 


275 


Ribosomal L27 protein 


PFAM 


31-86 


277 


Cell attachment sequence 


PROSITE 


292-294 


277 


DHHC zinc finger domain 


PFAM 


140-204 


279 


Endogenous opioids neuropeptides precursors 
signature 


PROSITE 


26-65 


279 


Vertebrate endogenous opioids neurope 


PFAM 


3-257 


279 


Endogenous opioids neuropeptides precursors 
proteins. 


BLOCKSPLUS 


100-126 


279 


Endogenous opioids neuropeptides precursors 
proteins. 


BLOCKSPLUS 


209-237 


279 


Endogenous opioids neuropeptides precursors 
proteins. 


BLOCKSPLUS 


43-66 


279 


Endogenous opioids neuropeptides precursors 
proteins. 


BLOCKSPLUS 


18-38 


279 


Endogenous opioids neuropeptides precursors 
proteins. 


BLOCKSPLUS 


24-36 


279 


Endogenous opioids neuropeptides precursors 
proteins. 


BLOCKSPLUS 


105-125 


280 


Leucine zipper pattern 


PROSITE 


136-157 


280 


Leucine zipper pattern 


PROSITE 


272-293 


283 


Immunoglobulins and major histocompatibility 
complex proteins signature 


PROSITE 


380-386 


283 


Immunoglobulin domain 


PFAM 


205-285;318- 
384 


283 


T 111" 1 i*1 "t 'a. 

Immunoglobulins and major histocompatibility 
complex proteins. 


BLOCKSPLUS 


319-336 


284 


Fucosyl transferase 


PFAM 


70-406 


285 


FAD/NAD-binding Cytochrome reductase 


PFAM 


27-149 


285 


Oxidoreductase FAD/NAD-binding domain 


PFAM 


176-290 


285 


Eukaryotic molybdopterin oxidoreductases 
proteins. 


BLOCKSPLUS 


58-86 


285 


CYTOCHROME B5 REDUCTASE 
SIGNATURE 


BLOCKSPLUS 


75-86 
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285 


CYTOCHROME B5 REDUCTASE 


BLOCKSPLUS 


274-283 


285 


CYTOCHROME B5 REDUCTASE 


BLOCKSPLUS 


141-156 


285 


Eukaryotic molybdopterin oxidoreductases 
proieins. 


BLOCKSPLUS 


274-286 


285 


Eukaryotic molybdopterin oxidoreductases 
proieins. 


BLOCKSPLUS 


60-85 


285 


CYTOCHROME B5 REDUCTASE 


BLOCKSPLUS 


181-198 


285 


FLAVOPROTEIN PYRIDINE NUCLEOTIDE 
CYTOCHROME REDUCTASE SIGNATURE 


BLOCKSPLUS 


181-197 


286 


Immunoglobulins and major histocompatibility 
complex proteins signature 


PROSITE 


380-386 


286 


Immunoglobulin domain 


PFAM 


205-285;318- 
384 


286 


Immunoglobulins and major histocompatibility 
complex proteins. 


BLOCKSPLUS 


319-336 


287 


Leucine zipper pattern 


PROSITE 


126-147 


288 


Leucine zipper pattern 


PROSITE 


20-41 


291 


Tissue inhibitors of metal loproteinases 
signature 


PROSITE 


24-36 


Zy 1 


Tissue inhibitor of metalloproteinases 


T>T A TV Jf 

r r AM 


n 1 on 

22-199 


Ofi 1 

29 1 


Tissue inhibitors of metalloproteinases proteins. 


BLOCKSPLUS 


21-46 


1Q 1 


Tissue inhibitors of metalloproteinases proteins. 


TDT r\PFCDT T TO 

BLOCKSPLUS 


1 n/c 1 a o 
1U6-145 


zy 1 


Tissue inhibitors of metalloproteinases proteins. 


DT r\PT/CT)T T TO 

BLOCKSPLUS 


O 1 f\C 

81-95 


lO 1 

zy 1 


Tissue inhibitors of metalloproteinases proteins. 


T> T APT/Om T TO 

BLOCKSPLUS 


61-72 


294 


Domain of unknown function DUF59 


TIT*' A A 4 


31-135 


296 


Immunoglobulin domain 


PFAM 


141-197 


297 


TonB-dependent receptor proteins signature 1 


PROSITE 


1-42 


298 


Fibroblast growth factor 


PFAM 


48-129 


299 


BolA-hke protein 


PFAM 


39-114 


299 


PROTEIN BOLA TRANSCRIPTION 
REGULATION AC. 


r\T AAT/OT4T T TO 

BLOCKSPLUS 


68-98 


301 


Cell attachment sequence 


PROSITE 


172-174 


303 


Ribosomal L27 protein ! 


PFAM 


31-115 


304 


Leucine rich repeat C-terminal domain 


PFAM 


173-222 


304 


Leucine Rich Repeat 


PFAM 


92-115;116- 
139; 140- 
163;164-185 


309 


Leucine rich repeat C-terminal domain 


PFAM 


173-222 


309 


Leucine Rich Repeat 


PFAM 


92-1 15;116-. 
139; 140- 
163;164-185 
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311 


NOLl/NOP2/sun family 


PFAM 


201-276;353- 
378 


311 


JNULl/iNUrz/sun larruly proteins. 


tut r\r^\^ COT T TO 
JoLUCKarLU o 


z3U-Z4j 


1 1 i 
311 


NULl/iNUrz/sun iamily proteins. 




111 IAS. 

Zj 1-Z4j 


1 1 0 

J IZ 


NULl/JNUrz/sun iamily 


rr AM 


ZUl-z /o 


O 1 T 

3 IZ 


JNULl/JNUrz/sun iamily proteins. 


TOT f^PI/CDT TTC 

BLUCKSrLUo 


zJ(J-z4j 


3 iz 


NULl/JNUrz/sun larmJy proteins. 


T5T f^OT/" CT>T TTC 

bLUCKar LU o 


Z31-z4j 


3 14 


Leucine zipper pattern 


rKUMlb 


o-zy 


315 


Leucine zipper pattern 


FRUSI 1 h 


o irk 

5-29 


341 


Immunoglobulin domain 


rr AM 


AC 1 1 O 

45-1 12 


349 


CDP -alcohol phosphatidyltransferases signature 


T)T> r\ C THTT? 


C A 1 C 

54-76 


J49 


Cytochrome b/b6 Qo site signature 


rRUMlh 


fi*7 1 AO 

yv-iuz 


354 


SAM domain (Sterile alpha motif) 


PFAM 


82-147 


361 


Ribosomal Proteins L2 


PFAM 


96-124 


368 


DAD family 


PFAM 


1-78 


370 


Ribosomal nrotein L34 


PFAM 


51-92 


385 


Kelch motif 


PFAM 


20-66*68- 
114;116- 
1 62; 164- 
209;211- 
265;270-316 


386 


SPRY domain 


PFAM 


85-205 


388 


PHD-fmger. 


BLOCKSPLUS 


329-339 


389 


Eukaryotic thiol (cysteine) proteases histidme 
dtuve sue 


PROSITE 


268-278 


"> on 

389 


Heat shock hsp70 proteins family signature 3 


PROSITE 


TOO O /I /f 

332-346 


389 


Hsp70 protein 


PFAM 


3-509 


390 


Eukaryotic-type carbonic anhydrase 


PFAM 


20-59 


391 


PMP-22/EMP/MP20/Claudin family 


PFAM 


4-162 j 


392 


Seel family. 


BLOCKSPLUS 


89-107 


393 


PMP-22/EMP/MP20/Claudm family 


PFAM 


4-182 


394 


■» r , (1 1*1 11"! 1* j " 

Myc-type, helix-loop-helix dimenzation 

uUJIld.Hl MgllalUie 


PROSITE 


13-28 


395 


Glutathione S-transferases. 


PFAM 


47-122;260-309 


396 


HP „ 1 A " 1 " j 

Transmembrane 4 family signature 


T»T* AC TTF 

PROSITE 


112-134 


396 


Transmembrane 4 family 


PFAM 


66-273 


396 


Transmembrane 4 family proteins. 


BLOCKSPLUS 


108-146 


396 


TRANSMEMBRANE FOUR FAMILY 


BLOCKSPLUS 


129-151 


396 


TRANSMEMBRANE FOUR FAMILY 
SIGNATURE 


BLOCKSPLUS 


108-127 


396 


TRANSMEMBRANE FOUR FAMILY 
SIGNATURE 


BLOCKSPLUS 


247-274 
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396 


TRANSMEMBRANE FOUR FAMILY 

pTn\T A TT TT» T~" 

SIGNATURE 


BLOCKSPLUS 


129-150 


396 


TRANSMEMBRANE FOUR FAMILY 

O TO"KT A TT TT) T? 

SIGN A 1 UKr, 


BLOCKSPLUS 


128-154 


397 


ATP/GTP-binding site motif A (P-loop) 


PROSITE 


6-13 


397 


ADP-ribosylation factor family 


PFAM 


2-172 


398 


Isochorismatase family 


PFAM 


17-147 


399 


PAP2 superfamily 


PFAM 


19-175 


400 


Zinc carboxypeptidases, zinc-binding region 2 
signature 


PROSITE 


117-127 


401 


Zinc finger, C2H2 type, domain 


PROSITE 


36-57 


401 


Zinc finger, C2H2 type, domain 


PROSITE 


73-93 


401 


Zinc finger, C2H2 type, domain 


PROSITE 


114-134 


401 


Zinc finger, C2H2 type, domain 


PROSITE 


145-165 


401 


Zinc finger, C2H2 type 


PFAM 


34-57;71- 

93;112-134;143- 

165 


401 


Zinc finger, C2H2 type, domain proteins. 


BLOCKSPLUS 


145-162 


401 


Zinc finger, C2H2 type, domain proteins. 


BLOCKSPLUS 


114-131 


401 


Zinc finger, C2H2 type, domain proteins. 


BLOCKSPLUS 


73-90 


402 


Zinc finger, C2H2 type, domain 


PROSITE 


113-133 


402 


Zinc finger, C2H2 type, domain 


PROSITE 


144-164 


402 


Regulator of chromosome condensation 
(RCC1) signature 2 


PROSITE 


65-75 


402 


Zinc finger, C2H2 type 


PFAM 


111-133;142- 
164 


402 


Zinc finger, C2H2 type, domain proteins. 


BLOCKSPLUS 


144-161 


402 


Zinc finger, C2H2 type, domain proteins. 


BLOCKSPLUS 


113-130 


403 


Glutathione S-transferases. 


PFAM 


47-122;260-309 


405 


PMP-22/EMP/MP20/Claudin family 


PFAM 


4-182 


406 


WD domain, G-beta repeat 


PFAM 


267-304;333- 
370 


408 


Rhomboid family 


PFAM 


186-323 


410 


Ank repeat 


PFAM 


47-79 


410 


REPEAT PROTEIN ANK NUCLEAR 
ANKYR. 


BLOCKSPLUS 


78-89 


410 


Ank repeat proteins. 


BLOCKSPLUS 


48-56 


412 


Serine proteases, subtilase family, aspartic acid 
proteins. 


BLOCKSPLUS 


165-178 


414 


Sir2 family 


PFAM 


84-268 


416 


Kelch motif 


PFAM 


20-66;68- 
114;116- 
162; 164- 
209;211- 
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265;270-316 


418 


Zinc-binding dehydrogenases 


PFAM 


16-313 


426 


Leucine zipper pattern 


PROSITE 


144-165 


447 


Cytochrome c family heme-binding site 
signature 


PROSITE 


19-24 


447 


Immunoglobulins and major histocompatibility 
complex proteins signature 


PROSITE 


17-23 


453 


eDF-6 family 


PFAM 


3-103 


454 


Cell attachment sequence 


PROSITE 


226-228 


456 


Leucine zipper pattern 


PROSITE 


211-232 


457 


Leucine zipper pattern 


PROSITE 


236-257 


466 


Zinc finger, C3HC4 type (RING finger), 
signature 


PROSITE 


56-65 


466 


SPRY domain 


PFAM 


375-500 


466 


Zinc finger, C3HC4 type (RING finger) 


PFAM 


41-81 


466 


B-box zinc finger. 


PFAM 


110-153 


466 


Domain in SPla and the RYanodine Receptor. 


BLOCKSPLUS 


359-381 


466 


Domain in SPla and the RYanodine Receptor. 


BLOCKSPLUS 


443-457 


466 


Domain in SPla and the RYanodine Receptor. 


BLOCKSPLUS 


359-380 


466 


Zinc finger, C3HC4 type (RING finger), 
proteins. 


BLOCKSPLUS 


56-65 


A H(\ 

479 


UhsX domain 


rr AM 


Tin A AO 

329-4U8 


481 


TBC domain 


PFAM 


65-171 


481 


Probable rabGAP domain proteins. 


BLOCKSPLUS 


153-159 


482 


TBC domain 


PFAM 


65-177 


482 


Probable rabGAP domain proteins. 


BLOCKSPLUS 


153-159 
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Table VII 



Seq Id No 


Epitopes 


242 


98..109;119..127;136..147;156..170;242..248;255..265;318..32 
8;356..363;399..407;443..450;475..490 j 


941 


~\ Q-SQ fiS-fiQ 70-111 1yfi-14y 1SV1Q1 108-91? yyO-y^l 94 
5;302..315 


OAA 


70 16-11 49-70 87-110 147-960 974 

Zy.. JO, J J.. HZ, /y..o / , L Jy.. LH / ,ZO*7..Z / *t , 


94*^ 


101 107-141 1S1-1S6 16*v106 907-99S 911-949 9S1-9S1 9 

1U1..1v/j1't1..1j1jIj0..1 OJ, 1 "O. .Zv / ,ZZJ..Zjj,Z'tZ..ZJ 1 ,Z J J ..Z 

60-284 298-121 110-119 147*195 406 


247 


41 51-108 120-121 111-190 200*255 261*102 107 1 


248 


5 11*18 46-52 60*75 81*92 99-111 150*167 181*187 200*2 
10..219;244..252;270..286;335..345;354..371;390..397 


249 


68..80;91..99;132..138;185..193;265..273;276..293;295..306;30 
5..329;327..341;347..358;394..403 


250 


28..37;60..67;73..81 


251 


33..45;64..71 


252 


20..30;35..45;49..59;74..83 


253 


3..9;59..65 


254 


22 33-35 52-53 67-70 77-80 100-106 117-142 147 


255 


116 123*147 156-201 208-262 278 


956 


10 15-116 191 

1 U..1J) 1 1U..1Z-1 


257 


41 51-59 66*79 80*94 101-190 197-114 147-180 191*904 9 
10 227 240 


258 


147 157*189 199 


259 


52..59;66..76;103..113;115..127;131..140;143..148;181..199;24 
2 250-253 262-262 273-279 289-330 341-342 366-373 394 


260 


94..107;112..119;125..134 


261 


121..126;144..152;213..224 


263 


44..50 


264 


51..58;82..90;153..164 


266 


15..20;38..49;76..81;95..105 


267 


74..91;94..99;117..130;140..154;153..161;175..184;201..210;22 
8..240;250..255 


268 


36..42;43..54 


269 


41..46;64..73;80..100;106..122;160..172 


270 


38..48;82..88 


271 


34..40;72..79;111..123;146..153;251..259;307..314;316..322;37 
2..377;436..444 


272 


12..17;51..58;75..85;128..136 


273 


4..13;56..64 


274 


34..46;120..127;157..163;182..191;231..240;259..267;273..279; 
291..299;344..355 


275 


30..55;72..78 


276 


27..35;37..45;49..61;61..77;102..109;144..152;170..180;179..18 
8 
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277 


61 67 147 152 154 166 284 299 308 313 


278 


79 89*451 461*532 541 


279 


24..3 1 ;72..84;83..92;97.. 1 1 1 ; 144.. 149; 1 61 .. 1 82; 1 8 1 .. 1 89; 1 92.. 1 
98-704 214-216 233-241 254-256 263 


280 


15..20;50..56;61..66;204..212;251..260;354..362 


281 


24..38;44..52 


ISO 


/Z..oz,4j 1..4o1 ? jJz..j41 


283 


1..6;21..31;77..83;115..120;228..237;276..281;335..343;401..40 

I ;44U. .4 j o;4!>o. .4oo 


284 


4..9;39..47;50..66;78..94;111..122;132..141;169..174;190..202; 

oil iir\.iA*7 iGi.i/zi n/t.ooo ir\A.i/co n/z,iic\ toa.ioc /i 
zl J..zzU;z4i..zjz;zol..z/4;zoz..3UU;io9..i /o;3 /9..3o9;39j..4 

01 

U3 


285 


29..38;42..47;58..65;100..110;121..134;156..161;161..173;201.. 

907-910 910-941 9^4-900 109 


90A 

zoo 


1 £-91 90*77 Ql-11*\ 190*990 917«97£ 901«11*"C Izll-/I01 zio 
1 ..0,zl ..Zo, / / ..oj,1 1 J.. lzU,zZo..Z3 /,Z /O..ZOI ,J3j..J43,4U 1 ..4U 

7 

/ 


987 


9 10-04 104-948 958-968 986 


988 

ZOO 


68 80-01 00- H9 H8-185 104-969 970-977 988-901 701-70 

UO..Ov,7l ..77,1 JZ...1 JO,10J..l ]7H ? Z,UZ. .Z / V7,Z / J..Z,00,Z7l .JWl,Jiy 

0 324-322 336*342 353*389 398 


289 


23. .38 


291 


28 35*96 104*134 144*159 167*177 187*191 198 


292 


1 7 56 64*66 73*77 92 


293 


40..45;99..109 


904 

Z7H 


47 S7-190 196 

H / ..J / , 1ZU..1ZO 


Z,7 J 


71 61-76 89-147 140-1S6 160 

J1..01,/O..OZ, InJ.. 1H.7, 1 JU-. 1U7 


906 

Z.70 


177 147-151 156*161 167-160 181-18S 104 

1 J J .. 1 Hj, 1 J jO,1U1..1D/,1U7. .101,1 O J.. 1*7*+ 


907 

Z.7 7 / 


SO S8-S0 60-117 197-190 177 

JU. JO,J7..07,l 1 J..1ZJ,1ZU.. 1 J / 


908 


4S SS-S9 67-106 117-118 198-196 171-148 1S-v1S7 164-17 
t fj..JJ,JZ..Oj, ll/O.. 1 1 / ,! 1 0.. 1 ZO, 1 ZO.. 13 1, 1 Ho.. 1 DJy 1 J / .. 1 OH, 1 / 

9 100-919 991-939 947 


299 


5l..59;82..87;ll3..l25;l24..l35 


100 


79 89-4S1 461**\19 S41 
/Z..0Z,H3 1 ..HOI ,jjZ..j £ fl 


101 


47 S9-88 10S-109 911-9SS 971 
HJ..3Z,00.. 1 UJ, 1*7Z..Z 1 1,ZJ J..Z / 1 


302 


3..18;37..44;57..65;70..76;98..113;121..134 


101 


1A ^<-79 77-07 0Q-1O£ 111 

JU..3.7, /Z\. / /,oZ..oo, lUO.. 1U 


lOzt 
JUH 


9 1 1 -11 A9-/10 *""A- i *\*\ 61-199 1 1 1 - 1 ,17 1**\/1-168 100*900 900 

z.. 1 1 , j j.. hz, ho.. jh, j j..0j, izz.. i j l , ih / .. i z>h, loo., i ou,zuu..zuy 

■91 1 990*996 977-968 978*986 901 
,Z 1 1 ..ZZU,ZZO..ZJ J,ZO0..Z / 0,ZOO..Z7 1 


305 


22..31 i 


10A 
3UO 


S 1 1 -9S l i *'-79 81-19/1 114-1/17 1^7-IAl 178-177 18A-18S 1 
J.. 1 1 ,Zj.. J J, /Z..01 , 1ZH.. 1 3H, IH / .. 1 J / , 1 OJ.. 1 /Oj 1 / / ..loO, 1 03.. 1 

95;207..217 


307 


23..38 


308 


66..72;84..100 


309 


2..11;33..42;48..54;55..63;122..131;147..154;168..180;200..209 
;211..220;226..233;268..278;286..291 


310 


45..52;60..68;88..94;99..109;113..120;121..134;162..171;169..1 
84;194..202;209..215;223..235;239..248;273..281;292..301;319 
..329;336..341;389..394;398..405;421..426 
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311 


15..21;28.35;82..91;113..120;125..133;153..167;236..243;291.. 

/"QZ-HM ^17-116 Hl-'XOCi ^QA-zinA AAI-Alfi ZK7 


^19 

J 1Z 


1S 91-98 1V89 01-11^ 190-19S TH-ISl 167-9^6 94^-901 
90R107 ^19-^16 ^97-^^9 ^70-^70 ^89 


D 1 J 


^8 46-S9 60-7S 8^-09 QQ-I^H 1S0-167 1$tt-1R7 900-910 91 
jo. .hu, jz. .vju, / j ..oj z,..yy ) xdd .. iju, iu / .. ioj, 10 / ..zvv/ 9 ziv/..zi 

9;239..256 


314 


36..42;52..58;65..70;80..87;143..155;161..168;176..185;203..20 
8;263..272 


315 


io..42;52..5o;65..70;8U..87;14J..155;lol..l68 


316 


33..47;49..58;106..117;125..132 


317 


45..53;54..68;88..94;99..l09;ll3..l20;l2l..l34;l62..l7l;l69..l 

54;194..zUz;209..zl5;223..z35;239..z48;z7 

..3zy 3 33D..341,3oy..3y4,3yo..4Uj 5 4zi..4zo 


318 


42 5469 77-124 130-148 153-156 165-186 210*219 227-27 
1 286 293 300 


319 


15..21;25..37;36..59;58..64;80..89;86..102;105..119 


320 


1..6;81..89 


321 


111. .116 


322 


1..21;50..68;76..85 


323 


11..16;49..68 


325 


14..20;40..55;69..76;122..131;128..138;158..166 


326 


18 45-61 68*81 89*110 141 142 149 

x \j * » r ^ a » • v/ 5 w x « 1 v v j i. a. v» » ■ x r x j x r z^ ■ ■ x 1 ^ 


327 


43..48;53..62;85..95 


328 


4..9;34..46 


329 


1..7;58..69 


^10 


1A Al-Afx S9S6 66 


jji 


SO 60-79 RO-80 KOQO OR-110 IIS 




\A 94-SO S6 


J J J 


1 641 d7-R6 06- 190 \*\A 


11A 


17 9^--d.1 Al-^Ci S7-RS 00-06 10S-1SJ. 1SQ-181 1Q9-1Q9 108 


335 


4..9;21..28 


336 


49 65 -70 80 


337 


20..30;36..42;53..61;74..83;110..119;125..138;137..142 


338 


21..55;55..65;62..81;88..100;99..107 


339 


33..47;52..60;73..82 


340 


5..12;13..21;32..51;65..72 


341 


19..30;44..52;51..61;68..82;96..108 


342 


18 26'44 54*71 77 


343 


21 33 57 66 68 77 80 89 91 98 


344 


36 44*89 95-110 116*181 193 


345 


19..28;34..41;43..55;80..86;119..124;159..165;167..176 


346 


26..35;39..49;88..95; 1 36.. 145;207..2 1 7 


347 


2..10;26..32;52..68;75..86 


348 


71..77;82..89;114..125 


349 


54..66;70..76;294..302 


350 


16..25;31..37;69..80 
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351 


68..73;118..129;135..148 


352 


68..73;118..129;135..148 


353 


10..18;61..69;68..77 


354 


13..20;44..51;57..85;94..102;143..151 


355 


18..30;38..43;43..54 


356 


1..6;40..45;120..126;129..154;153..162 


357 


49..59;133..141;152..167 


358 


9..26;80..85;96..102 


359 


98..104;181..187 


360 


1..6;44..54;113..123;147..161 


361 


64..74;77..90;112..141 


362 


20..30;37..42;53..61;74..83;110..119;125..134;138..147 


363 


1..11;26..31;53..60;97..105;110..117;141..146 


364 


16.. 24 


365 


7.. 15 


367 


8.. 14 


368 


16..23 


369 


5..14;59..71;74..83 


370 


39..45;45..67;82..91 | 


371 


37..46;81..86;92..99;140..156;163..169;179..184;209..216;242.. 




252;258..268 


372 


72..81;93..104;141..146 


373 


71..77;86..93 


374 


1..7;31..47;47..57;59..65 


375 


5..10;55..60;83..90 


376 


34..43;40..46;67..81 


377 


27..38;49..56;54..64;68..78;1 1 1..122 


378 


9..26;80..85;96..102 | 


379 


41..50;52..58;195..204;318..327;337..347 | 


380 


98. .103 


381 


46..51 


382 


67..72 


383 


4..12;25..33;73..81 


384 


42..54;69..77;117..127;125..141 


385 


105..112;146..155;181..187;200..214;230..243;254..264;261..2 




72 290 295 


386 


46 5392 100-107 114-116 123-126 132-176 18? 


387 

JO/ 


5 17-44 50-158 167-1Q0 10S 


388 


16 25-31 37-73 79-96 101*1 OS 110-1 166-215 224-240 2 




47*273 280*299 306*321 334*347 352*370 375 


389 


28..35;72..81;93..100;101..113;183..192;218..228;284..300;318 




..330;339..344;443..454;481..489;491..500 


390 


1..8;39..48 


391 


104..114;157..162 


392 


28..34;31..36;46..55;77..85 


393 


104..114;188..224 


394 


44..51;107..114 
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395 


2..15;51..61;82..88;104..114;249..259;286..298;333..340;361..3 
67 


396 


24 31-54 65-79 85-92 99180 186-216 221 

z,*-r. . j i,.j*t..L"J, / ^..o.j,.7z....7.7,iov/.. i ou,aiu. i 


397 


20..33;31..41;67..75;82..89;168..173 


3Q8 

J*70 


6*nQ3 101-13S 1/13-108 903 


399 


11..17;37..48;71..79;94..100;99..112;132..144;161..173;173..18 


400 


21..31;65..84;91..99 


Af\\ 

401 


l..y;l 1..2/; /o\.53;oo..yo;lU/..l 12;1 12.. 121; 135.. 141; 147.. 153; 

1 1 70 
1 04.. 1 /U 


402 


1..9;11..35;83..94;106..111;111..120;134..140;146..152;163..16 

Q 


403 


2..15;51..61;82..88;104..114;249..259;286..298;333..340;361..3 

/C*7 
D / 


A C\Z 

4U5 


1 H/l 11/1. TOO OO/l 

1U4..1 14, loo. .224 


4Uo 


1.. 7;ol.. 71; 7 7..o3;oU..o5; 163.. 1 /y;2U4..21 1;21U..225;23 1..23o; 

9<\zl 9*\Q-979 972-10<l 191 -1/17 




77-190 110-1zlzl 1 ^0- 1 1£Q-1Q£ 909-9AA 979-111 14^ 

jj..//, izu.. i ju, i*f*t.. oy, i jy.. iov, iyo..zuz,zoo..z i ..jh j 


409 


1..9;73..81;226..236 


410 


12..21;37..49;78..85 


A 1 1 

41 1 


22..3o;151..15o;lol..l /U 


412 


56..65;136..146;179..186;192..207;227..236 


413 


8..14;92..99;184..191 


414 


1..10;16..30;71..77;95..105;102..113;123..l30;137..145;165..l7 
l;199..205;217..226;286..295;309..314;360..376;384..389 


415 


19..34;100..107;115..123;l43..149;154..164;168..175;176..189; 

217..226;224..239;249..257;264..270;278..290;294..303;328..3 

36;347..356;374..384;39l..396;444..449;453..460;476..48l 


Zt1 fx 


1 0*"\ 119-1A7 1-\<\-18l 187-900 91ZL-910 9zL1-9*\Zl 9£A-9A1 9 
1UJ .. 1 IZ, l*t /.. 1 Jj, 10 1 1 0 / ,ZUU..Z l^,ZjU..Z*f j,ZJ^+..Zl> 1 t,ZOI ..z 

72-990 995 


417 


8 14 I 


41 8 


8 14-97 99*997 936-260 973-293 300 


419 


8 16-117 129 


420 


27 3V71 83*91 97*137 146 


421 


2 11 -77 82 


422 


20..26;61..69 


A91 
HZj 


0 10-9-n 1A-A1 AO-^7 A^-27 Q9-10A 11A-19A 1 1/1 


A9A. 
HZ-H- 


£ 18-91 1/1-*\1 £9-7Q 8A-8Q G^-108 111-19Q 1 AO 
0.. 1 o,Zl ..j4,jj..0Z, / V..oO,oy..-7D ) lUo.. I 13, IZo.. 14V 


HZ J 


9 11 -90 zlA-47 
Z.. 1 1 ,zy..*^+o,* i f / ..jj 


H-ZO 


j..ZO,jj..Oo 


477 
HZ / 


1 11-1Q 35MR AQ-S9 6OQ0 QR-100 10S-19Q 137-1^ 160 
1 .. 1 1 , l7..JOpo..n7,JZ..OU,jv..70, lull.. 1UJ, 1Z*7.. 1j/,1jj.. 1DU 


428 


23..31;47..61;65..72;87..94 


429 


3..12;31..39;59..69;82..87 


430 


81..91 


431 


36..44;83..89 
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432 


28..42;56..76;110..117 I 


433 


5..14;43..49 


434 


9..17;15..21;61..70;80..89 


435 


1..24;32..40;52..60 


436 


1..11;17..30;29..43 


437 


20..30;28..35;40..50 


438 


10..28;76..85;91..99;107..112 


439 


86..92;104..110 


440 


36 42-47 57-63 71-106 121 


441 


19..25;27..44 


442 


1..14;21..29;36..42;44..51;62..81;101..109;111..119;138..149 


443 


10.. 1 8;25 ..3 1 ;33..40;5 1 ..70;89..94 


444 


3..8;19..26;32..44 


445 


1..11;19..38;38..49;52..60;130..139 


446 


12..20;28..37;43..66;90..102 


447 


15..20;24..31;36..47;68..82;88..96 


448 


29..45;83..91;88..94;132..144 


449 


22..33;54..64;86..96;102..108 


450 


27..39;47..60;101..107;155..164;270..281;287..300;306..312;32 




7. .332 


451 


13..24;60..70;77..83 


452 


8..14;74..82 j 


453 


6..12;63..78;77..91;97..103;102..108 


454 


32..44;66..72;101..114;166..174;209..235;243..252;258..263 


455 


69..76;131..139;164..173 


456 


54..63;95..103;187..202;211..216;249..261 


457 


14..21;31..45;80..88;187..194;347..353 


458 


47..62;79..86 j 


459 


1..8;27..37;90..97;99..106;123..140;145..163 \ 


460 


8..17;35..45;131..139;162..169;175..180 


461 


1..6;13..23;58..66;89..101 


462 


44..53;86..93 


461 


62 70 


464 


50 57 59 69*67 73*79 95 


46S 


10 17*91 44 


4 -T\J\J 


3 15-71 78-110 121-125 131-259 269-296 306*312 318*340 




..346;353..363;370..379;407..412;417..425;448..453;483..493 


467 


5..12;20..30;70..78;82..100;106..115;129..135 


468 


8..16;22..31;36..45;75..84 


469 


14..23;98..105;106..116 


470 


11..23;26..31;54..62;101..107 


471 


23..29;66..81 


472 


23..29;93..100 


473 


8..25;79..89;103..109 


474 


37..45;80..89;94..101;125..130 
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475 


37..45;80..89;94..101;125..130 


476 


7..26;23..36;36..45;78..83;80..85 


477 


45..53 


478 


1..7;16..22;78..93;96..102 


479 


24..33;41..50;61..80;93..100;129..136;160..170;199..208;267..2 
76;325..335 


480 


5..14;43..51;102..116 


481 


2..15;16..24;53..62;87..97;100..109;109..133;145..152 


482 


2..15;16..24;53..62;87..97;100..109;109..133;145..152;168..176 
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Table VIII 



Sea Id No 


Chromosomal location 

Will VIIIWwwIIIUI 1 wwUtl VI 1 


2 


16d11-d13 < 


12 


16d11-d13 


22 


7a35-a36 


25 


chr 19 

v/l II . i C 


34 


chr 17 

\s\ 11.11 


35 


6d21 3 


40 


chr 20 


42 


12d13 3 I 


45 


12d13 3 


51 


12d 


56 


22a11 2-a13 2 


57 


12p13 


60 


chr. 10 


62 


chr. 17 


65 


Xq13 


67 


chr.14 


70 


chr7MY7a11 23-a21 1(1^ 


71 


chr7MV7a11 23-a21 ' 


73 


6n21 3 


74 


6n21 3 


87 


19a13 1 






Q4 




9Q 


6n21 I 


101 

1 \J 1 




103 


rhr 17 

Ol II. 1 r 


106 


6a15-n16 3 


107 


16d13 3 


108 


12a 


113 


1 n33-r>34 3 


125 


6d22 1-d22 3 


126 


16d13 3 I 


127 


14a 11 2 I 


135 


22a 1 1 2-a 13 2 


138 


chr 3 

II . o 


141 


12q24.1 


146 


3p21.3 


147 


chr.2 I 


149 


chr. 17 


150 


21q 


152 


21q 


154 


20q12-q13.11 


155 


11p15.5 


160 


19q13.2 
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161 


19q13.2 


162 


20q12-q13.11 


164 


21q i 


166 


21q 


170 


6p12.1-p21.1 


172 


21q 


173 


chr.19 


176 


21q ! 


177 


21q 


179 


chr.6 [ 


183 


chr.7 [ 


185 


Xq21.3-q22.3 


186 


chr.20 


192 


11q12.2 


195 


chr.20 


196 


20q13.1-q13.2 


197 


7p15-p21 


198 


19q13.3 


199 


chr.2 | 


201 


Xq22.1-q23 


202 


Xq22.1-q23 


204 


chr.20 


205 


chr.5 


206 


chr.2 [ 


208 


chr.5 


214 


chr.12 


220 


Xq28 


224 


chr.7 


227 


chr.14 


230 


chr.7 


231 


chr.7 j 


238 


19p13.3 
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Table IX 



Seq Id No 


Tissue distribution 


1 


Br:28;FB:25;FK:9;Ov: 1 7;P1: 1 2;Pr:4;SC:2;SG:4;Te:9 


2 


Br:2;CP: 1 ;FB:5;FK: 1 ;Pl:3;Pr: 10;SG: 1 


3 


Br:l;CP:l;FB:33;FK:13;Li:2;Ov:19;PG:12;Pl:27;Pr:15;SG:9;SI:12 


4 


AG: 1 ;CP: 1 ;LG: 1 ;Pr:3;Te: 1 


5 


Pa:4;Pr:2 


6 


Li:l;Pa:4;Pr:3 


7 


Br:9;Pr:l;Te:3 


8 


Br:4;FB:l;Pr:3;SG:8 


9 


Br:4;Ce:l;Co:l;DM:4;FB:33;FK:16;He:3;Ki:6;LC:2;LG:4;Li:2;Lu:2;Ly:l;Ov:36;Pa 
:16;Pl:2;Pr:4;SC:2;SI:l;SN:l;Sp:l;UC:3;Ut:l 


10 


Br:l;CP:l;Pr:4;SG:2 


11 


Pr:2;SG:4 


12 


Br:l;CP:l;FB:5;FK:l;Pl:3;Pr:9;SG:l 


13 


FL:4;Li:4 


14 


Li:4;Te:3 


15 


Te:l 


16 


Li:3;Te:6 


17 


Ce:l;FB:6;Li:l;Pl:5;Te:16 


18 


Li:7;Te:6 


19 


Li:27;Te:9 


20 


Li:l;Te:3 


21 


Te:3 


22 


Te:3 


23 


Li:l;Te:6 


24 


Li:2;Te:2 


25 


Te:8 


26 


Te:5 


27 


LC:l;Te:2 


28 


Li:l;Te:2 


29 


AG:2;BM: 1 ;Br: 1 6;CP: 1 ;Co:2;DM: 1 ;FB:45;FK:62;FL: 1 ;HP:3;LC: 1 ;Li:2;Mu: 1 ;Ov:2 
;Pr:10;SI:5;SN:3;Te:9;UC:l 


30 


Li:2 


31 


Br:3;CP: 1 ;FB: 1 ;FK:6;Pr: 1 ;Te:2 


32 


Br:l;CP:l;Ce:6;Ov:l;Te:2 


33 


FK:5;SC:1 


34 


Br:l;FB:2;FK:48;Pl:2;SN:l 


35 


Te:l 


36 


FB:5;Pr:l;SN:l 
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37 


FB:3;FK:l;Li:l;SG:5 


38 


FB:10 


39 


FB:3 


40 


Br: 1 ;DM: 1 ;FL: 1 ;P1:4;SG: 13 


41 


FB:3;FK:l;Li:l;SG:5 


42 


BM:l;SG:l9 


43 


SG:l 


44 


CP:l;FB:l;Mu:2;Pl:9;SG:7 


45 


BM:l;SG:20 


46 


BM: l ;DM: l ;FB:5;FK:6;FL: l ;He: l ;Ki:2;Ov:9;Pl: l ;SG: l ;SI: l ;Te: l 


47 


Br:4;FB:4;Pr:3;SG:8 


48 


Br: l2;Ce: l ;Co: l ;FB:5 ;FK:4;FL:5;HP: l ;Ki: l ;LC: l ;Li:6;Ov: 8;Pl: 105;SC: l ;SG:8;Te: 
4 


49 


Br:7;Ce:l;Co:l;FK:4;HP:l;Ki:l;LC:l;Li:5;Ov:8;Pl:5;SC:l;Te:l 


50 


AG: l ;CP:4;Ce: l ;DM:2;FB:6;FK:4;FL:2;HP:2;LC: l ;LG:3;Li:3 1 ;Lu:3;Mu: l ;Ov:25;P 
l: 1 5;Pr:20;SC: l ;Te:75;UC:5;Ut: l 


51 


FL:l 


52 


Br:2;CP: l ;FB:3;FK: l ;FL:5;LC:2;Pl: l;Pr:2;UC:2 


53 


Br:3;FK:4;FL:4;HP: l ;Li: 3;Pl: 1 1 ;SG: l ;Te: l 


54 


Br:l5;Ce:l;FB:l0;FK:l0;FL:l;He:l;Ki:6;LC:l;Li:4;Ov:32;Pa:3;Pl:2;Pr:4;SC:l;SN: 
z,c>p.4, le.o,UC. l;Ut: 1 


55 


FL:2 


56 


Br:l;FB:l;FL:l;Te:l 


57 


FL:4 


58 


FL:l;Li:l 


59 


FB:3;FK:l;Li:l;SG:5 


60 


Br:l;FB:l;FL:l;Pr:2 


61 


Br:2;Pl:l 


62 


Br:6;CP:l;Ce:7;FB:37;FK:4;FL:l;Pl:6;Pr.l;SG:3;SN:3 


63 


Br: 10 


64 


Br:2;CP:2 


65 


Br:l;FB:ll;LG:l;Th:l 


66 


Br:30;Ce:l;Co:l;FB:60;FK:15;FL:3;HP:l;Ki:l;LC:l;Li:6;Ov:57;PG:9;Pl:145;Pr:^ 
oU.l ? M:4,ie.4 


67 


Br:4;CP: 1 ;FB: 14;Ki: 1 ;Li: 1 ;Lu:2;Pr: 1 ;Te: 1 


68 


Br: 10 


69 


AG:l;Br:48;FB:3;FK:5;HP:l;He:l;Li:l;Pl:ll;SC:2;SG:l;Te:2;Ut:l 


70 


Br:ll;DM:l;He:l 


71 


DM:l;He:l 


72 


Br:9;Pr:l;Te:2 


73 


Br:8;Pr:l 


74 


Br:5 
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Afi-1 Rr4Q-FR-4 FK-S HP-1 -Hp-1 I C- 1 -T H -PH 1 SC-2SG- 1 Te-21 Jt- 1 


11 


Br:2;FK:2;HP: 1 ;LC: 1 ;Li:2;Ov: 14;P1: 1 ;Pr: 14;Te:5 


78 1 
/o 


D r Q.pp.i FiA/r-1 17R-71 F1T-1 R-FT -1 -UP-I .Ua-I IfiQT P?T H-zl'T i?T ii?-rh/ -^4-Pl 
Jt>r.!7,^c. 1,1JJV1. 1 ,rD.z 1 ,ris.. lOjr.L. i,xix . i,rie. i,ivi.y,x^v^.z,i^o.H,x^i.z 5 x^u.z 5 vjv.z>%i^ 

:3;Pr:4;SC: 1 ;SI:2;SN:2;Sp: 1 ;Ut: 1 


70 
/ y 


Pr- 1 


ro 


Rr O PP-9-rn- 1 DM ^ FR- 1 ■FKfvrfp?lc'v4T P1-T H-I T v7Ov-40P?r 1 *P1?«Pr 1 • 

SN:2;Sp:l;Te:12;UC:l;Ut:3 


O 1 


FK-1 -Tp-1 


8? 


T r 1 


OJ 


Rr-?-rP-1 FR*10FK?KrVT i-7-Ov 1 0-SC* 1 SN- 1 Tp* 1 T JC- 1 


84 


RrSFR-14FK-9T i-frOv 1 7-SG-R-Tp'R 


OJ 


T ifvTp? 


Rf\ 

OU 


T i*?Tp? 

JUL/., 1 CZ, 


o / 


Rr-1-FR-^-FK^1-I i ?0 Ov ^7 PO S*P1-69 ST S Te*S 

Dl . 1 ,X^ XJ..} J,PXV._J 1 ? L/l . Z-V/j W V .J / ? X VJ . J,X l.U7,01.J, JLC.J 


RR 
oo 


T i-1 Pr-1 Tp-7TTt? 

l,fl. 1 , 1 c / , LJ l.Z. 




Te*1 

J. V. 1 


90 
y\j 


Tp? i 


91 


FR1 SFK -VI r2 0vl7 Pr-4 SG*7 Te-4 


9? 


Tp? 


9^ 


Rr-4FR- 1 -<sN- 1 Tp? 


94 


Tp*1 


9S 


T r? 


96 


AG-1 -Rr-1 FR-1 


97 

y i 


FKVTp? 


QR 
yo 


Tp ^ 

1 C. J 


99 

yy 


Rr ^ FR ?Q FTC-1 T i-1 0-Ov 1 *P1- 1 fvPr?SG- 1 -Tp49 


1 no 


Rr-?-FR ^-Flf -1 -Ov VTf>*1 




Rr- 1 0 FR ^4 F1^- 1 -Ovl Pl ftS Pr* 1 *T Tt* 1 


10? 


FR6 


1 0^ 


FR-fvT i-^-PH Pr-1 v.G-1 Tp-7 
rD.u,Lij,n. i,n. i i, ic. / 


104 


Rr ?fi PP-1 FR-8 F]<r ni P1?0 Pr ?0 
Dl .z,Uj\w/jr . l ,rD.o,rjv. ui ,ri.Li/jr i .z.v/ 


10S 


Rr ^ rP ? DM ? FR-1 1 FKVT G?Ov1 P1*6SC?SG-1 \SN*4 


106 


FB:4 


1 07 


Rr*^ FR-^0-FT^-^Q FT *^-Pr-1 


1 OR 


Rr-1 -FR-8-T i-1 T n-1 


1 OQ 

ivy 


FR-1 1 -Pr-?1 
rx). 1 1 ,xT .Z l 


1 1 0 


Rr-1zl P^-1 FR-^ FIT -^ FT -1 FTP-1 FT^ ^ Fi-^ T P-1 T i-4 T n-1 -n\/-7-P1?-Pr- 1 -^T-1 '\n 

or. i^jV^e. 1 ,rD.j,rK.j,rL. 1 ,xix . x,xxc.z,xvio,i^v-/. 1,1^1.^1^11. i ,uv, /,n.z,rr. i,oi. i ,op 
:1;UC:1 


111 


Br: 1 ;Ce: 1 ;FK: 1 ;HP: 1 ;He:2;Ki:3;LC: 1 ;Li:3;Lu: 1 ;Ov:7;Sp: 1 ;UC: 1 


112 


Br: 1 ;HP: 1 ;Lu: 1 ;Pr: 1 ;SG:9;Ut: 1 


113 


HP:1;SG:4 


114 


FK:9 
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115 


AG: 1 ;Br:3;CP: 1 ;FB: 14;FK: 19;FL: 1 ;HP: 1 ;Pr: 1 ;SG: 1 \ 


116 


Br:5;CP:l;Ce:l;Co:l;DM:5;FK:3;FL:l;LC:3;LG:l;Lu:l;Ov:23;Pl:l;Te:8;UC 


117 


Br: 1 ;Ce: 1 ;FB: 1 ;FK: 1 ;FL:2;P1 :3;SN: 1 ;Te: 1 ;UC: 1 | 


118 


CP: 1 ;DM: 1 ;FB:5;FK:2;FL:2;He: 1 ;Lu: 1 ;Ly: 1 ;Ov:23 ;Pr: 1 ;SN:2;Sp:2;Ut: 1 


119 


Li:2;Te:7 j 


120 


Br:6;Co:2;FB : 1 ;FK:6;FL: 1 ;Ov:3 ;Pl:32;Pr: 1 ;SN: 1 


121 


AG:l;Br:4 


122 


Br: 5 


123 


Br:l 


124 


Br:2;Ki:2;Li:l;Ov:7;UC:l 


125 


Br:2;FB : 1 ;FL:6;He: 1 ;Li: 1 ;Ov: 1 ;Pl:2;Pr: 1 0;Te: 1 ;Th: 1 


126 


Br:l 


127 


BM: 1 ;Br:2;CP:2;FB: 1 ;FK:3;HP: 1 ;He: 1 ;LG: 1 ;P1: 1 ;Pr: 1 ;SC:2;SG:2;Te:5;Ut:3 


128 


Br:l 


129 


Br:2;FB:6;Li:l;SG:3;Te:2 


130 


Br:25;FB:3;FL:2 


131 


Br:l 


132 


Br:l 


133 


Br:l 


134 


Br:2;SN:l 


135 


Br:l 1 


136 


AG:l;Br:l;FL:l 


137 


Br: 1 ;Ce: 1 ;FB: 1 ;FK: 1 ;FL:2;P1:3;SN: 1 ;Te: 1 ;UC: 1 


138 


Br:43 


139 


Br: 1 1 ;CP:2;Co: 1 ;DM:6;FB : 1 ;FK:6;He:2;Ki:4;LC: 1 ;LG: 1 ;Ov:40;Pa: 1 ;Pl:2;Pr:l;SN: 
2;Sp:l;Te:9;UC:l;Ut:3 


140 


Br:23;Ce:l;DM:3;FB:38;FK:17;FL:2;HP:l;He:l;Ki:8;LC:3;LG:2;Li:6;Lu:l;Ly:l^^ 
v:40;Pr:4;SC:2;SN:4;Sp: l;Te:5;UC: l;Ut: l 


141 


Br:39;FB:3;SN:2 


142 


Br:l0;SN:2 


143 


Br:26;FK:2;HP: l ;LC: l ;Li:2;0v: l4;Pl:3;Pr:3;Te:5 


144 


Br:l4;Pr:2 


145 


FB: 1 2;LG: I ;Pr:4;Te: l ;Ut:2 


146 


Li:l;Ov:2;Pr:5;SG:ll 


147 


Li:l;Te:l 


148 


Br:l;FB:l;Li:l;Te:l 


149 


Br:3;FB:5;FK:5;Li: 1 ;Pl:8;Te:5 


150 


FK:6;Pr:2;SG:8 


151 


FK:9 


152 


FK:6;Pr:2;SG:9 


153 


Te:l 
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154 


FB:28;Ov:4 


155 


Br:21;Ce:l;FB:32;FK:4 


156 


Br:5;CP:l;FB:16;FK:3;He:l;Ki:5;Li:l;Ov:15;Pl:3;SG:2;SI:l;Sp:l;UC:l | 


157 


FB:14;FK:1;FL:1;SG:1 


158 


FB:7 j 


159 


FB:10 


160 


Ce:2;FB:12 


161 


Ce:2 


162 


FB:28;Ov:2 


163 


FB:14;FK:1;FL:1;SG:1 


164 


FK:4;Pr:l;SG:9 


165 


Br:4;Co: 1 ;Ki: 1 ;Ov:2;Pr:4;SG: 1 


166 


FK:6;Pr:2;SG:9 


167 


Br:l;FB:l;SG:5 


168 


Br:l;FB:5;FK:7;SG:l;UC:l 


169 


FK:2 


170 


FL:12 


171 


Br:2;FB:l;FK:l;Pl:7 


172 


Br:106;FB:2;Pl:7 


173 


Br:14;FB:l;Pl:2;Te:l 


174 


Br:17;He:l;Pl:l;SC:2;Te:l 


175 


Br:14;Pr:2 


176 


Br:106;FB:2;Pl:7 


177 


Br: 1 14;FB:7;FK:7;Ov:2;Pl:7;Pr:2;Te:9 


178 


Br: 1 6;CP:2;FB:2;FK:2;FL: 1 ;Li: 1 ;P1: 1 3;Pr:3;SC: 1 ;Ut: 1 


179 


FL:l;HP:2;Pr:2;Te:l 


180 


Pr:2 


181 


FB:l;Ov:2;Pr:l;UC:l 


182 


BM: 1 ;Br;4;DM: 1 ;FB:6;FK:6;Ki:5;LC:2;LG: 1 ;Li: 1 ;Lu: 1 ;Ov: 1 5;P1: 1 ;Pr:2;SC: 1 ;Sp:2; 
Te:2;Ut:l 


183 


Br:8;CP:l;Co:2;DM:4;FB:l;FK:l;Ki:4;LC:l;Li:3;Ov:33;Pl:l;Pr:5;SC:2;SN:^ 
le:5;UC l;Ut:2 


184 


Pr:l 


185 


FB:2;Li:l;Ov:l;SG:7;Te:5 


186 


Te:3 


187 


Te:l 


188 


Br:18;CP:l;DM:5;FB:40;FK:23;FL:2;He:3;Ki:10;LC:2;LG:l;Li:13;Lu:3;Ly:2;Mu:l 
;Ov:54;Pl:5;Pr:14;SC:2;SG:2;SI:2;SN:4;Sp:3;Te:4;UC:4 


189 


Li:l;Te:l 


190 


Br:7;CP: 1 ;FB: 1 ;FK:4;FL:5;He: 1 ;Li: 1 ;Ov: 1 ;Pl:2;Pr:4;SG: 1 


191 


Li:2;Te:4 
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192 


AG: l ;Br:2;CP: l ;FB:32;FK: l ;Li: l ;Ov:36;Pl:49;Pr:3;SC: I ;SG:4;SN:4;Te:9;UC: l ;Ut: 

9 

z- 


1 m 

193 


FB:31;FK:75;FL:7;Ov:12;Pl:23;Pr:8;SG:3;Ie:16 


1 r\ a 

194 


Te:2 


1 95 


Te:7 


196 


Te:2 


197 


Te:3 


198 


Li:10;Te:43 


199 


Br:35;CP:3;FB:39;FK:56;FL:7;HP:l;LG:l;Li:l;Ly:l;Ov:2;Pl:10;Pr:8;SG:l;Te:4;U^ 
z 


200 


FB:l7;FK:9;FL:5;Ov:2l;Pl:4l;Te:3 


201 


X^X^ "1 /""I X 1 

FK:16;SI:1 


202 


Br: l ;Co: l ;FB : 1 1 1 ;FK:25 ;He: l ;Li:4;Ov:3;Pr:6;Te: I 


204 


Te:7 


205 


Li:7;Te:28 


206 


FB:28;Li:2;Ov:23;PG:l l;Pl:45;SG:l7;SI:ll;Te:9 


207 


FB: 1 6;FK: 1 ;Ov: 1 ;SC: 1 ;Te: 1 


208 


FB:5 


209 


FB:6 


210 


Br:l;FB:22 


211 


Br:2;Ce:3;FB:6;FK:l 


212 


Br:l;Co:2;FB:22;FK:2;LG:2;Mu:2;Pl:2;SG:4 


213 


Br:2;DM:l;FB:8;FK:8;FL:l;Ki:l;LG:3;Ov:5;Pa:l;Pl:4;Pr:l;S 


214 


FB:7 


215 


FB:4 


216 


Ov:3;SG:3 


217 


Br:4;CP:2;DM:l;FB:9;FK:3;Ki:2;LC:l;LG:l;Lu:3;Ly:l;Ov:14;Pl:l;Pr:l;SC:^ 
,ap. I,le. l,Ut. 1 


218 


FB :4;FK:2;P1: 1 ;Pr: 1 1 ;SG: 1 


219 


X"k *1 r>T\ i 1 1 \ ^ X"*X -1 XXX** A X -flX m # 1 ✓"" K v a T^Vl "1 X*\ "1 fix T T> 1 

Br:7;CP:3;FB:2;FL: 1 ;HP:4;Lu: 1 ;Ly:2;Mu: 1 ;Ov:3;Pl: 1 ;Pr: 1 ;SN:2;Te: 1 


220 


T~A 1 T~* T 1 X\1 f% 

Br:l;FL:l;Pl:2 


221 


1 f ''X\ /"» X^T 1 X ' 1 X"»1 ^ 

Co: l ;FB:2;FL: I ;Li: l ;Pl:2 


222 


FL:l;SG:2 


223 


Li:l;Te:l 


225 


Li:10 


226 


Li:l;Te:4 


227 


Li:l 


228 


Br:l 


229 


Br:3 


230 


Br:5;Ce:l;Co:l;DM:3;FB:l;FK:l;He:l;LC:l;LG:2;Ov:16;Pl:3;Pr:l;Te:2;Ut:l 


231 


Br:3;Ce: 1 ;Co: 1 ;DM:3;FB: 1 ;FK: 1 ;He: 1 ;LC: 1 ;LG:2;0v: 1 6;Pl:3;Pr: 1 ;Te:2;Ut: 1 
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232 


AG: 1 ;Br: 17;CP:2;DM: 1 ;FB:5 1 ;FK:9;FL:3;Li:3;Ov:3;Pl:2;Pr: 10;SC: l;SG:5;Te:2;Ut: 
1 


233 


Br: 13 


234 


Br:5 


235 


Br:l;Pl:l 


236 


Br:9 


237 


Br:22;DM:2;FB: 1 7;FK:9;Ki:4;LG: 1 ;Li: 1 ;Lu:2;Ov:24;Pr:3;SC: 1 ;SI: 1 ;SN:2;Te:2 


238 


Br: 17 


239 


Br: 11 


240 


Br:28;Ce: 1 ;DM:5;FB:52;FK:40;FL:2;HP: 1 ;He:2;Ki:3;LC: l;LG:3;Li: 1 ;Ly: 1 ;Ov:28; 
PI: l;Pr:5;SC: 1 ;SI: 1 ;SN:3;Sp:6;Te: 1 ;UC: l;Ut: 1 


241 


Br:4;Ce:l;DM:5;FB:5;FK:7;HP:l;He:2;Ki:3;LC:l;LG:3;Li:l;Ly:l;Ov:28;Pl:l;SC:l; 
SN:3;Sp:6;Te:l;UC:l;Ut:l 
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Table X 



Seq Id No 


Low frequency 
expression 


High frequency 
expression 


1 


- 


Br,Ov 


2 


- 


Pr 


3 


Br,Te 


Ov,PG,Pl,SI 


4 


- 


AG 


5 


- 


Pa 


6 


- 


Pa 


7 


- 


Br 


8 


- 


SG 


9 


Br,Te 


DM,He,Ki,Ov,Pa 


10 


- 


Pr 


11 


- 


SG 


12 


- 


Pr 


13 


- 


FLXi ; 


14 


- 


Li,Te J 


15 


- 


Te 


16 


- 


Li,Te 


17 


- 


Te 


18 


- 


Li,Te 


19 


- 


Li,Te 


20 


- 


Te 


21 


- 


Te 


22 


- 


Te 


23 


- 


Te | 


24 


- 


Li 


25 


- 


Te 


26 


- 


Te 


27 


- 


LC,Te 


28 


- 


Te 


29 


PI 


FK 


30 


- 


Li 


31 




FK 


32 




Ce 


33 




FK,SC 


34 


FB 


FK 


35 




Te 


36 




SN 


37 




SG 
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1ft 

JO 




rr> 


A(\ 
4U 






A\ 
4 1 




qp 

OU 


AO 
4Z 






A1 
43 






A A 

44 




MU,rl,ovj 


45 




"DA/T OP 


/I /C 

46 




BM,Ki 3 Uv ! ! 


a n 
4/ 




op 


4o 


rB ? rK,rr 


T>1 

rl 


/in 

4v 




Ki,Uv 


5U 


T>« T7TD "CV OP 


Li,Uv, 1 e 


c. i 




rL 


5z 




TTT T P T TP 


CO 

53 




rl 


54 




t/" ; a„ T) 0 o*% 

Ri,L>v,ra,c>p 


55 






5 1 




rL 


CO 
5o 




rJL 


5y 




CP 


6z 




Ce,rB 


OJ 




Br 


64 




PT> 


65 




rB,ln 


66 


r? \r op t 1 -. 


r>p t>i i 


67 




TT"D 17, T >, 

rB,Ki,Lu 


/CO 

68 




Br 


/cn 
69 


rB 


Br 


*7A 
/U 




Br,DM,rle 


/l 




DJVl,rle 






Br 


15 




Br 


/4 




Br 


/5 




Br 


/6 


JrB 


o.- 


1 1 


T7T) 

rB 


Uv 5 rr 


78 




Ki,Uv 


oU 


rr> 


JLJM,Kl,vJV 


82 




Li 


83 




Ki.Li.Ov 


84 




Ov 


85 




Li 
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86 


- 


Li 


87 


Br,Pr,SG 


Ov,PG,Pl 


88 


- 


Te,Ut 


89 


- 


Te 


90 


- 


Te 


91 


- 


Ov 


92 


- 


Te 


93 


- 


SN 


94 


- 


Te 


95 


- 


Li 


96 


- 


AG 


97 


- 


FK 


98 


- 


Te 


99 


FK 


Te 


100 


- 


Ov 


101 


FK 


PI 


102 


- 


FB 


103 


- 


Te 


104 


FB,Li,SG,Te 


FK 


105 


- 


DM,SN 


106 


- 


FB 


107 


Br,Pl 


FB,FK 


108 


- 


FB,Lu 


109 


- 


Pr 


110 


- 


He,Ki,Ov 


111 


- 


Ce,He,Ki,Lu,Ov 


112 


- 


Lu,SG 


113 


- 


HP,SG 


114 


- 


FK 


115 


- 


FK 


116 


FB 


DM,LC,Ov,Ut 


117 


- 


Ce,UC 


118 


- 


Ov,Sp 


119 


- 


Te 


120 


FB 


Co,Pl 


121 


- 


AG,Br 


122 


- 


Br 


124 




Ki,Ov 


125 




FL,Pr,Th 


127 




BM,SC,Ut 


130 




Br 


134 




SN 
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1 7A 
1 JO 




Art 


1 J / 






1 7R 

1 Jo 


VVK 
r d 


DI 


1 7Q 

i jy 


r o 


DlV/f l^i H\/ T Tt 

UlVl,Jvl,wV,U l 


izin 

14U 


PI 

Jrl 


JvljL^V 






Jdx 


14Z 






1 /17 




r$r,uv 


1/1/1 
144 




r>r 


1 /I ^ 
143 




T?D TT+ 


1 /1A 
140 




cri | 


1 AQ 
14V 




P1 

Jrl 


1 

13U 






1^1 
ID 1 




JrJs. 


1 ^7 
13Z 




pit en 






1 e 


1 <\A 
1 34 




rDjUV 


13 J 






1 <\A 
130 




Ifi 


13 / 




PT* 


1^2 
135 




P13 


1 

13y 




PR 
rD 


i ah 
10U 




r: r> 


1 ai 
101 






1 A 7 

10Z 




r?D 
rr> 


1 A 7 
lOJ 




JrrS 


1 A/1 

104 




cri 


1 A< 
103 




L,0,iVl,UV 


1 AA 
100 




PV Qf^ 


1 A7 
10 / 






1 AC 
lOo 




PI^ 
r rv 


1 AG 
10V 






1 7ft 
1 /U 




PT 


171 
1/1 




PI 


1 77 
1 /Z 


T7T3 TTT^ 


p r 
r>r 


1 77 
1 / J 




r>r 


1 7/1 
1 /4 






1 7*\ 




p r 

DI 


176 


FB,FK,Pr 


Br 


177 


FB 


Br 


178 




Br,Pl 


179 




HP 
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1 80 




Pr ! 


181 




wv,u^ 


1 89 






1 8^ 


FR 
r id 


j_y lvi , jvi , \j v 


1 8S 




l)U, 1 e 


1 86 
1 oo 




1 e 


1 87 
1 O / 




1 C 


1 

1 oo 


P1 


rV\/T T^i Hi; 
i^lVIjJvljiJV 


1 on 




rL, 


1 Q1 
ly 1 




1 e 


1 09 

lyz 




Hv PI 


1 

150 


r>r 




1 CM 




i c 


1 CK 




I e 






1 e 


1 Q7 

Ly / 




1 C 


108 

1 70 


FR 

ID 


T i TV 


1 QQ 




Flf 
riv 


zuu 


Dl 


PI 


901 
ZU 1 




Fl^ 


909 
ZUZ 






9fH 


D r pi 


Ft* 


9HA 
ZU^ 




1 e 


9fK 




T i TV 1 


9H£ 
ZUO 


Tir W Pr- 


Ht; p/^i pi on ct « 


9H7 
ZU / 




rr> 


90S 
ZUo 




rr> 


zuy 




rr> 


9 1 0 
Z1U 




rp 
rr> 


9 1 1 
Zl 1 




rv 


9 1 9 
Z 1Z 




\^0 3 rr>,iviu 


911 






9 1 A 
Z l^f 




rD 


91 <\ 
Z 1 3 




rD 


91 £ 

z 10 




UVjOu 


91 7 
Zl / 




Jvl,JLU,wV 


91 52 
Z 1 0 




rT 


91 Q 




pp up y v rjv <sXF 


221 




Co 


222 




SG 


223 




SG 


225 




Li 
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996 




i e 


997 




T i 

JLl 


99Q 




r>r 








911 




JJM,UV 


919 




rb 


911 




r>r 


91/1 




br 


91£ 
ZjO 




Br 


237 




Ki Lu Ov 


238 




Br 


239 




Br 


240 


Pl,Te 


DM,FK,Ki,Ov,Sp 


241 


FB 


DM,He,Ki,Ov,Sp 
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Table XI 



Seq Id No 


Subcellular localization 


7 


nuclear 


13 


extracellular, including cell wall 


20 


mitochondrial 


21 


nuclear 


26 


nuclear 


35 


nuclear 


37 


endoplasmic reticulum 


38 


extracellular, including cell wall 


39 


endoplasmic reticulum 


41 


endoplasmic reticulum 


59 


endoplasmic reticulum 


70 


nuclear 


71 


nuclear 


72 


nuclear 


78 


nuclear 


98 


nuclear 


99 


nuclear 


105 


mitochondrial 


108 


endoplasmic reticulum 


116 


mitochondrial 


117 


mitochondrial 


134 


nuclear 


135 


nuclear 


137 


mitochondrial 


159 


nuclear 


160 


nuclear 


161 


nuclear 


171 


nuclear 


178 


endoplasmic reticulum 


182 


nuclear 


184 


nuclear 


185 


endoplasmic reticulum 


186 


nuclear 


187 


nuclear 


188 


nuclear 


194 


nuclear 


195 


nuclear 


196 


nuclear 



560 



G-078US05DIV 



900 


mitr\r , Vinnr1rici1 
1111 lU^HUI 1U1 lal 


904 


Til I^IrQT* 


90^ 


Til ir* 1 f*CIT 


906 

ZvO 


IlUL/lCal 


9 1 1 

Z 1 1 


ril tf> 1 anr 


9 1 9 
Z 1Z 


nuciear 


9 1 3 
Z 1 j 


nuclear 


9 1 zl 
Z l^t 


enaopiasmic reticulum 


91^ 
Z 1 J 


endoplasmic reticulum 


9 1 £ 
Z 10 


enaopiasmic reucuium 


9 1 ft 
Z lo 


nuciear 


990 
ZZU 


enaopiasmic reucuium 


224 


nuclear 


225 


nuclear 


230 


mitochondrial 


231 


mitochondrial 


238 


cytoplasmic 
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Table XII 



seq 10 r\o in 

priority 

applications 


internal designation 


oeq la ino in 

oresent 

application 


119 


11 9-003 -4-0-C2-CS 


1 


220 


105-016-1-0-D3-CS 


2 


345 


105-016-3-0-G10-CS 


3 


334 


1 05-026-1 -0-A5-CS 


4 


159 


105-031-1-0-All-CS 


5 


219 


1 05-03 1-2-0-D3-CS 


6 


250 


105-035-2-0-C6-CS 


7 


217 


105-037-2-0-H11-CS 


8 


340 


105-053-4-0-E8-CS 


9 


115 


105-074-3-0-H10-CS 


10 


31 


105-089-3-0-G10-CS 


11 


198 


105-095-2-0-G11-CS 


12 


154 


106-006-1 -0-E3-CS 


13 


366 


106-037-l-0-E9-CS.cor 


14 


366 


106-037-l-0-E9-CS.fr 


15 j 


79 


106-043-4-0-H3-CS 


16 


95 


11 0-007- 1-0-C7-CS 


17 


364 


114-016-1-0-H8-CS 


18 


246 


11 6-004-3 -0-A6-CS 


19 


187 


11 6-054-3 -0-E6-CS 


20 


203 


1 16-055-1 -0-A3-CS 


21 


298 • 


116-055-2-0-F7-CS 


22 


277 


116-088-4-0-A9-CS 


23 


41 


116-091-1-0-D9-CS 


24 


353 


116-110-2-0-F4-CS 


25 


78 


116-11 1-1 -0-H9-CS 


26 


245 


116-111-4-0-B3-CS 


27 


104 


116-115-2-0-F8-CS 


28 


259 


116-119-3-0-H5-CS 


29 


269 


117-001-5-0-G3-CS 


30 | 


166 


145-25-3-0-B4-CS.cor 


31 


166 


145-25-3-0-B4-CS.fr 


32 


169 


145-56-3-0-D5-CS 


33 


312 


145-59-2-0-A7-CS 


34 


273 


157-15-4-0-B11-CS 


35 


190 


160-103-1-0-Fll-CS 


36 


244 


160-37-2-0-H7-CS 


37 
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151 


160-58-3-0-H3-CS 


38 


149 


160-75-4-0-A9-CS 


39 


307 


174-10-2-0-F8-CS 


40 


264 


174-33-3-0-F6-CS 


41 


168 


1 74-38-1 -0-B6-CS 


42 


202 


174-38-3-0-C9-CS 


43 


28 


174-39-2-0-A3-CS 


44 


331 


174-4 1-1-0-A6-CS 


45 


258 


174-5-3-0-H7-CS 


46 


84 


174-7-4-0-H1-CS 


47 


294 


175.l-3_0-E5-CS.cor 


48 


294 


175-l-3-0-E5-CS.fr 


49 


310 


180-19-4-0-F4-CS 


50 


311 


181-10-1-0-DlO-CS 


51 


263 


181-1 6-1 -0-G7-CS 


52 


304 


181-16-2-0-A7-CS 


53 


109 


181-20-3-0-B5-CS 


54 


121 


181-3-3-0-B8-CS 


55 


181 


181-3-3-0-C9-CS 


56 


191 


182-1-2-0-D12-CS 


57 


193 


184-1-4-0-C11-CS 


58 


192 


184-4-1-0-All-CS 


59 


116 


187-12-4-0-A8-CS 


60 


268 


187-2-2-0-A3-CS 


61 


123 


187-31-0-0-fl2-CS 


62 


234 


1 87-34-0-0-1 12-CS 


63 


185 


187-37-0-0-clO-CS 


64 


279 


1 87-38-0-0-1 10-CS 


65 


114 


187-39-0-0-k 12-CS 


66 


211 


187-41-0-0-i21-CS 


67 


236 


188-1 1-1-0-B3-CS 


68 


35 


188-18-4-0-A9-CS 


69 


299 


188-28-4-0-B12-CS.cor 


70 


299 


188-28-4-0-B12-CS.fr 


71 


72 


188-28-4-0-D4-CS 


72 


242 


188-41-l-0-B8-CS.cor 


73 


242 


188-41-l-0-B8-CS.fr 


74 


173 


1 88-45-1 -0-D9-CS 


75 j 


106 


188-9-2-0-E1-CS 


76 


130 


105-079-3-0-A11-CS 


77 


323 


105-092-1 -0-H7-CS 


78 


160 


105-141-4-0-H9-CS 


79 ! 
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272 


109-01 3-1 -0-B9-CS 


80 


226 


110-008-4-0-D9-CS 


81 


333 


1 14-001-3-0-A2-CS 


82 1 


315 


ll 4-028-2 -0-Cl-CS 


83 


300 


H4-032-1-0-H10-CS 


84 


57 


114-043-2-0-A10-CS 


85 


137 


114-044-1-0-C5-CS 


86 


107 


116-003-3-0-D10-CS 


87 


164 


116-003-3-0-G12-CS 


88 


108 


1 16-011 -2 -0-F11-CS 


89 


101 


116-033-3-0-E4-CS 


90 


157 


116-041-4-0-B6-CS 


91 


75 


116-044-2-0-C4-CS 


92 


322 


11 6-075-1 -0-E6-CS 


93 


124 


116-094-4-0-G5-CS 


94 


289 


117-005-3-0-F2-CS 


95 


122 


121-007-3-0-D9-CS 


96 


208 


145-91-3-0-D10-CS 


97 


282 


157-17-1-0-F4-CS 


98 | 


129 


160-1 1-3-0-G8-CS 


99 


317 


1 60-24-1 -0-F12-CS 


100 


308 


1 60-24-2 -0-E9-CS 


101 


25 


1 60-25 -4-0-D2-CS 


102 


243 


160-31-3-0-A11-CS 


103 


346 


1 60-32-1 -0-F6-CS 


104 


60 


160-37-1-0-A3-CS 


105 


305 


1 60-40-3 -0-E9-CS 


106 


48 


160-58-3-0-E4-CS 


107 


238 


160-85 -3 -0-D4-CS 


108 


251 


160-95-3-0-A11-CS 


109 


196 


162-10-4-0-F9-CS.cor 


110 


196 


162-10-4-0-F9-CS.fr 


111 


347 


174-13-2-0-E4-CS 


112 


77 


1 74-46-2 -0-B11-CS 


113 


188 


179-8-2-0-A6-CS 


114 


235 


180-22-3-0-B6-CS 


115 


45 


18 1-13-1 -0-F7-CS 


116 


265 


181-15-4-0-F7-CS 


117 


280 


1 81-20-1 -0-G7-CS 


118 


281 


184-15-3-0-D1-CS 


119 


39 


187-12-2-0-G11-CS 


120 


165 


187-2-2-0-A12-CS 


121 
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79£ 


1 87 7n n a U77 r^c 


1 OO 

Izz 


^7f) 


1 87 7A n n p1Q r^Q 


123 


J\JO 


1 87 78 a a ri79 


1 O/l 

lz4 


71 


1 87 70 n n ko r'c 


1 oc 

Izj 


994 


1 87 7Q a a rr£ r^e 


1 O/C 

lzo 


on 
yu 


187 /i^ a a no r*Q 


1 oo 

127 


Z 1 0 


1 Q7 /I C A A »vi0 1 PC 
1 o /-4j-U-U-mz 1 -to 


1 o o 

128 


SI 


1 OO /1C A A PC 


129 


7A9 
J4Z 


1 87 A£L A A -fOI PC 

1 o /-4o-U-U-iz3-CS 


130 


9A9 

zoz 


1 07 C 1 A AH PC 

lo /-j- 1 -U-Alz-CS 


I3l [ 


9^7 

Z J / 


1 07 CIA T7/: PC 


132 


707 
Zyj 


1 07 C 7 A "DO PC 

i o /-j-z-U-DZ-Lb 


1 O 1 

133 


971 


1 07 c i a r^c PC 


134 


987 
Zo / 


1 07 C 1 A A PC 


IOC 

135 


79S 

JZ J 


1 87 1 A "DO /^C 


136 


700 


1 o /-0-4-U-L, 1 U-Lo 


1 O T 

137 


7^0 


1 00 1 O 7 A PQ po 

1 oo- i y-Z-U-Co-Cb 


1 o o 

138 


oo 


1 88 77 A A /^/C oo 
1 oo-ZZ-4-U-vjrO-to 


139 


977 

Zjj 


188 78 /I A T^1 1 r*Q 
ioo-Zo-4-U-JJl l-Lo 


1 /1A 

140 


7£Q 

joy 


180 70 1 A IT 1 A PO 

1 oo-ZV- 1 -V-tL 1 U-Lo 


141 


i ^ 


1 oo 7/1 /i a lire po 


142 


797 


1 00 G 1 A A C PC 


143 


787 
Z53 


1 AC AO 1 1 A PI P o 

1 lO-Uz 1 -3-0-C3-CS 


144 


90 

zy 


1 AC AIT /I A TT10 PO 

1Uj-U3 /-4-0-Hlz-CS 


145 


1 nn 


1 AC A71 OA AO 


146 


00 


1 AO aao A A p/: po 

1 Uy-00z-4-0-C6-CS 


147 


7£A 


1 AO AAT 1 A P/1 PO 

1 Uy-UUi - 1 -U-Cj4-Co 


148 


77 1 
jZ 1 


11/C11Q/1A AO PO ' 

1 lo-l 18-4-0-A8-CS 


149 


1 7A 
1ZU 


1 A C CO O A TM O PO 

140OZ-Z-U-D12-CS 


150 


77A 
ZJU 


1/1cooapc po 
143- /-Z-U-LrJ-CS 


151 


1 77 
ill 


/I C 7 7 A T^Q PC 

14J-/-3-U-D3-LS 


1 

152 


A'X 


1 CO 1 O O A PC 

1 J /-l /-Z-U-L. 1 -Co 


153 


7<n9 

JJZ 


1 /£A 1A1 Q A UO PO 


154 


zl7 
H 1 


<A lO 1 A T~M A PO 

1 0U- 1 Z- 1 -U-D 1 U-Cb 


tec I 

155 


1 o<; 


1 ^A OO /I A P/1 PO 

1 oU-Zo-4-U-C4-Co 


1 C f 

156 


LA A 
744 


1 o(J-3 1 -3 -0-h4-CS 


157 


£1 


• zTA /1A 1 A T T /I PC 

1 OU-4U- 1 -0-H4-CS 


158 


237 




1 

1 J7 


32 


160-88-3-0-A8-CS.cor 


160 


32 ; 


160-88-3-0-A8-CS.fr 


161 


97 


160-99-4-0-E4-CS 


162 


249 | 


161-5-4-0-B6-CS 


163 
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218 


174-17-1-0-D6-CS 


164 


266 


174-32-4-0-F8-CS 


1 65 


161 


174-38-4-0-Dll-CS 


166 


113 


174-8-2-0-ClO-CS 


167 


255 


179-14-2-0-F11-CS 


1 68 


24 


179-9-4-0-B8-CS 


169 


128 


181-10-1-0-C9-CS 


170 


58 


187-5-3-0-C7-CS 


I7l 


358 


188-26-4-0-F5-CS 


1 72 


171 


188-27-3-0-G1-CS 


173 


98 


188-29-2-0-H1-CS 


174 j 


133 


188-3 1-1 -0-E6-CS 


175 j 


49 


188-45-1-0-D3-CS 


176 


42 


1 88-5-1 -0-H6-CS 


177 


148 


188-9-1-0-ClO-CS 


178 


319 


105-016-3-0-C5-CS 


179 


254 


105-026-4-0-D9-CS 


180 


55 


105-053-2-0-D9-CS 


I8l 


89 


105-069-3-0-A11-CS 


1 82 


212 


105-076-4-0-F6-CS 


183 


143 


105-135-2-0-F9-CS 


184 


179 


1 06-023 -4-0-F6-CS 


185 


54 


110-001-3-0-Cll-CS 


186 


354 


11 0-002-3 -0-F9-CS 


187 


297 


114-019-3-0-D9-CS 


188 


291 


1 14-029- 1-0-C6-CS 


189 


30 


114-032-4-0-B1-CS 


190 | 


363 


1 14-070-2 -0-H4-CS 


191 


194 


1 1 6-01 6-3 -0-F11-CS 


192 


76 


116-022-4-0-G2-CS 


193 


232 


11 6-052-2 -0-H8-CS 


194 


139 


116-053-4-0-B4-CS 


195 


145 


11 6-094-3 -0-H2-CS 


196 


206 


H6-H2-4-0-C7-CS 


197 


88 


H6-123-3-0-F12-CS 


198 


69 


123-008-1-0-C5-CS 


199 


285 


145-53-2-0-H8-CS 


200 


96 


145-57-2 -0-C9-CS.cor 


201 


96 


145-57-2-0-C9-CS.fr 


202 • 


26 


145-7-3-0-B12-CS 


203 


261 


157-12-2 -0-D1-CS 


204 


349 


157-16-2-0-D5-CS 


205 
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92 


157-1 8-2 -0-A7-CS 


206 


O 1 

81 


1 ZTA 1 AO 1 A nm /~i O 

160-103-1 -0-B10-CS 


^ A^ 

207 


74 


1 /TV 1 A /I /I A F1 /~i o 

1 60- 1 04-4-0-F3 -CS 


208 


66 


160-22-2-0-DlO-CS 


209 


270 


1 60-24-3 -0-F12-CS 


210 


318 


160-3-2-0-H3-CS 


211 


215 


160-5 8-2 -0-A2-CS 


212 


214 


1 60-73-1 -0-B4-CS 


213 


94 


1 60-75 -4-0-E6-CS 


214 


73 


1 60-97-3 -0-E9-CS 


215 


296 


174-1-4-0-E9-CS 


216 


127 


174-12-4-0-C2-CS 


217 


350 


180-19-4-0-H2-CS 


218 


320 


181-10-4-0-G12-CS 


219 


292 


181-3-2-0-F6-CS 


220 


125 


"1 r> "t A A /*\ A "1 ✓ — ^ g~* 

181-4-4-0-A12-CS 


221 


91 


1 8 1 -9-2-0-F 1 2-CS.cor 


222 


91 


l8l-9-2-0-Fl2-CS.fr 


223 ! 


338 


184-13-3-0-E11-CS 


224 


102 


184-4-2-0-D3-CS 


225 


105 


1 84-7-1 -0-E7-CS 


226 


112 


184-8-4-0-G9-CS 


227 


267 


187-10-3-0-G9-CS 


228 


260 


187-32-0-0-m20-CS 


229 


343 


1 87-32-0-0-n2 1 -CS.cor 


230 I 


343 


187-32-0-0-n21-CS.fr 


231 


341 


187-4-2-0-E6-CS 


232 


59 


187-40-0-0-il5-CS 


233 ; 


111 


187-47-0-0-g24-CS 


234 


40 


1 87-9-3 -0-A2-CS 


235 


355 


188-26-4-0-H1-CS 


236 


J 1 


1 00-JJ-J-U-U7-LO 


ZD / 


180 


188-38-4-0-D8-CS 


238 


152 


1 88-41-1 -0-E6-CS 


239 


93 


188-42-2-0-F3-CS.cor 


240 


93 


188-42-2-0-F3-CS.fr 


241 



5 
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